Commit 4774b045 authored by Bharath Ramsundar's avatar Bharath Ramsundar
Browse files

Moves

parent c082f1cf
Loading
Loading
Loading
Loading
+22 −1
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Modeling Solubility
# Tutorial Part 3: Modeling Solubility

%% Cell type:markdown id: tags:

Computationally predicting molecular solubility through is useful for drug-discovery. In this tutorial, we will use the `deepchem` library to fit a simple statistical model that predicts the solubility of drug-like compounds. The process of fitting this model involves four steps:

1. Loading a chemical dataset, consisting of a series of compounds along with aqueous solubility measurements.
2. Transforming each compound into a feature vector $v \in \mathbb{R}^n$ comprehensible to statistical learning methods.
3. Fitting a simple model that maps feature vectors to estimates of aqueous solubility.
4. Visualizing the results.

## Colab

This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/03_Modeling_Solubility.ipynb)

## Setup

To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

%% Cell type:code id: tags:

``` python
!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
```

%% Cell type:markdown id: tags:

We need to load a dataset of estimated aqueous solubility measurements [1] into deepchem. The data is in CSV format and contains SMILES strings, predicted aqueaous solubilities, and a number of extraneous (for our purposes) molecular properties. Here is an example line from the dataset:

<table style="width:100%">
  <tr>
    <th> Compound ID </th>
    <th> ESOL predicted log solubility (mols/liter) </th>
      <th> Minimum Degree </th>
      <th> Molecular Weight </th>
      <th> # H-Bond Donors </th>
      <th> # Rings </th>
      <th> # Rotatable Bonds </th>
      <th> Polar Surface Area </th>
      <th> Measured log solubility (mols/liter) </th>
      <th> smiles </th>
  </tr>
  <tr>
    <td>benzothiazole</td>
    <td>-2.733</td>
    <td>2</td>
      <td> 135.191 </td>
      <td> 0 </td>
      <td> 2 </td>
      <td> 0 </td>
      <td> 12.89 </td>
      <td> -1.5 </td>
      <td> c2ccc1scnc1c2 </td>
  </tr>

</table>


Most of these fields are not useful for our purposes. The two fields that we will need are the "smiles" field and the "measured log solubility in mols per litre". The "smiles" field holds a SMILES string [2] that specifies the compound in question. Before we load this data into deepchem, we will load the data into python and do some simple preliminary analysis to gain some intuition for the dataset.

%% Cell type:code id: tags:

``` python
from deepchem.utils.save import load_from_disk

dataset_file= "../../datasets/delaney-processed.csv"
dataset = load_from_disk(dataset_file)
print("Columns of dataset: %s" % str(dataset.columns.values))
print("Number of examples in dataset: %s" % str(dataset.shape[0]))
```

%% Output

    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
      warnings.warn(msg, category=FutureWarning)
    RDKit WARNING: [18:28:27] Enabling RDKit 2019.09.3 jupyter extensions
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)])

    Columns of dataset: ['Compound ID' 'ESOL predicted log solubility in mols per litre'
     'Minimum Degree' 'Molecular Weight' 'Number of H-Bond Donors'
     'Number of Rings' 'Number of Rotatable Bonds' 'Polar Surface Area'
     'measured log solubility in mols per litre' 'smiles']
    Number of examples in dataset: 1128

%% Cell type:markdown id: tags:

To gain a visual understanding of compounds in our dataset, let's draw them using rdkit. We define a couple of helper functions to get started.

%% Cell type:code id: tags:

``` python
import tempfile
from rdkit import Chem
from rdkit.Chem import Draw
from itertools import islice
from IPython.display import Image, HTML, display

def display_images(filenames):
    """Helper to pretty-print images."""
    imagesList=''.join(
        ["<img style='width: 140px; margin: 0px; float: left; border: 1px solid black;' src='%s' />"
         % str(s) for s in sorted(filenames)])
    display(HTML(imagesList))

def mols_to_pngs(mols, basename="test"):
    """Helper to write RDKit mols to png files."""
    filenames = []
    for i, mol in enumerate(mols):
        filename = "%s%d.png" % (basename, i)
        Draw.MolToFile(mol, filename)
        filenames.append(filename)
    return filenames
```

%% Cell type:markdown id: tags:

Now, we display some compounds from the dataset:

%% Cell type:code id: tags:

``` python
num_to_display = 14
molecules = []
for _, data in islice(dataset.iterrows(), num_to_display):
    molecules.append(Chem.MolFromSmiles(data["smiles"]))
display_images(mols_to_pngs(molecules))
```

%% Output


%% Cell type:markdown id: tags:

Analyzing the distribution of solubilities shows us a nice spread of data.

%% Cell type:code id: tags:

``` python
%matplotlib inline
import matplotlib
import numpy as np
import matplotlib.pyplot as plt

solubilities = np.array(dataset["measured log solubility in mols per litre"])
n, bins, patches = plt.hist(solubilities, 50, facecolor='green', alpha=0.75)
plt.xlabel('Measured log-solubility in mols/liter')
plt.ylabel('Number of compounds')
plt.title(r'Histogram of solubilities')
plt.grid(True)
plt.show()
```

%% Output


%% Cell type:markdown id: tags:

With our preliminary analysis completed, we return to the original goal of constructing a predictive statistical model of molecular solubility using `deepchem`. The first step in creating such a molecule is translating each compound into a vectorial format that can be understood by statistical learning techniques. This process is commonly called featurization. `deepchem` packages a number of commonly used featurization for user convenience. In this tutorial, we will use ECPF4 fingeprints [3].

`deepchem` offers an object-oriented API for featurization. To get started with featurization, we first construct a ```Featurizer``` object. `deepchem` provides the ```CircularFingeprint``` class (a subclass of ```Featurizer``` that performs ECFP4 featurization).

%% Cell type:code id: tags:

``` python
import deepchem as dc

featurizer = dc.feat.CircularFingerprint(size=1024)
```

%% Cell type:markdown id: tags:

Now, let's perform the actual featurization. `deepchem` provides the ```CSVLoader``` class for this purpose. The ```featurize()``` method for this class loads data from disk and uses provided ```Featurizer```instances to transform the provided data into feature vectors.

To perform machine learning upon these datasets, we need to convert the samples into datasets suitable for machine-learning (that is, into data matrix $X \in \mathbb{R}^{n\times d}$ where $n$ is the number of samples and $d$ the dimensionality of the feature vector, and into label vector $y \in \mathbb{R}^n$). `deepchem` provides the `Dataset` class to facilitate this transformation. This style lends itself easily to validation-set hyperparameter searches, which we illustate below.

%% Cell type:code id: tags:

``` python
loader = dc.data.CSVLoader(
      tasks=["measured log solubility in mols per litre"], smiles_field="smiles",
      featurizer=featurizer)
dataset = loader.featurize(dataset_file)
```

%% Output

    Loading raw samples now.
    shard_size: 8192
    About to start loading CSV from ../../datasets/delaney-processed.csv
    Loading shard 1 of size 8192.
    Featurizing sample 0
    Featurizing sample 1000
    TIMING: featurizing shard 0 took 1.230 s
    TIMING: dataset construction took 1.311 s
    Loading dataset from disk.

%% Cell type:markdown id: tags:

When constructing statistical models, it's necessary to separate the provided data into train/test subsets. The train subset is used to learn the statistical model, while the test subset is used to evaluate the learned model. In practice, it's often useful to elaborate this split further and perform a train/validation/test split. The validation set is used to perform model selection. Proposed models are evaluated on the validation-set, and the best performed model is at the end tested on the test-set.

Choosing the proper method of performing a train/validation/test split can be challenging. Standard machine learning practice is to perform a random split of the data into train/validation/test, but random splits are not well suited for the purposes of chemical informatics. For our predictive models to be useful, we require them to have predictive power in portions of chemical space beyond the set of molecules in the training data. Consequently, our models should use splits of the data that separate compounds in the training set from those in the validation and test-sets. We use Bemis-Murcko scaffolds [5] to perform this separation (all compounds that share an underlying molecular scaffold will be placed into the same split in the train/test/validation split).

%% Cell type:code id: tags:

``` python
splitter = dc.splits.ScaffoldSplitter(dataset_file)
train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split(
    dataset)
```

%% Output

    Computing train/valid/test indices
    About to generate scaffolds
    Generating scaffold 0/1128
    Generating scaffold 1000/1128
    About to sort in scaffold sets
    TIMING: dataset construction took 0.053 s
    Loading dataset from disk.
    TIMING: dataset construction took 0.027 s
    Loading dataset from disk.
    TIMING: dataset construction took 0.023 s
    Loading dataset from disk.

%% Cell type:markdown id: tags:

Let's visually inspect some of the molecules in the separate splits to verify that they appear structurally dissimilar. The `FeaturizedSamples` class provides an `itersamples` method that lets us obtain the underlying compounds in each split.

%% Cell type:code id: tags:

``` python
train_mols = [Chem.MolFromSmiles(compound)
              for compound in train_dataset.ids]
display_images(mols_to_pngs(train_mols[:10], basename="train"))
```

%% Output


%% Cell type:code id: tags:

``` python
valid_mols = [Chem.MolFromSmiles(compound)
              for compound in valid_dataset.ids]
display_images(mols_to_pngs(valid_mols[:10], basename="valid"))
```

%% Output


%% Cell type:markdown id: tags:

Notice the visual distinction between the train/validation splits. The most-common scaffolds are reserved for the train split, with the rarer scaffolds allotted to validation/test.

%% Cell type:markdown id: tags:

The performance of common machine-learning algorithms can be very sensitive to preprocessing of the data. One common transformation applied to data is to normalize it to have zero-mean and unit-standard-deviation. We will apply this transformation to the log-solubility (as seen above, the log-solubility ranges from -12 to 2).

%% Cell type:code id: tags:

``` python
transformers = [
    dc.trans.NormalizationTransformer(transform_y=True, dataset=train_dataset)]

for dataset in [train_dataset, valid_dataset, test_dataset]:
  for transformer in transformers:
      dataset = transformer.transform(dataset)
```

%% Output

    TIMING: dataset construction took 0.042 s
    Loading dataset from disk.
    TIMING: dataset construction took 0.007 s
    Loading dataset from disk.
    TIMING: dataset construction took 0.010 s
    Loading dataset from disk.

%% Cell type:markdown id: tags:

The next step after processing the data is to start fitting simple learning models to our data. `deepchem` provides a number of machine-learning model classes.

In particular, `deepchem` provides a convenience class, ```SklearnModel``` that wraps any machine-learning model available in scikit-learn [6]. Consequently, we will start by building a simple random-forest regressor that attempts to predict the log-solubility from our computed ECFP4 features. To train the model, we instantiate the ```SklearnModel``` object, then call the ```fit()``` method on the ```train_dataset``` we constructed above. We then save the model to disk.

%% Cell type:code id: tags:

``` python
from sklearn.ensemble import RandomForestRegressor

sklearn_model = RandomForestRegressor(n_estimators=100)
model = dc.models.SklearnModel(sklearn_model)
model.fit(train_dataset)
```

%% Cell type:markdown id: tags:

We next evaluate the model on the validation set to see its predictive power. `deepchem` provides the `Evaluator` class to facilitate this process. To evaluate the constructed `model` object, create a new `Evaluator` instance and call the `compute_model_performance()` method.

%% Cell type:code id: tags:

``` python
from deepchem.utils.evaluate import Evaluator

metric = dc.metrics.Metric(dc.metrics.r2_score)
evaluator = Evaluator(model, valid_dataset, transformers)
r2score = evaluator.compute_model_performance([metric])
print(r2score)
```

%% Output

    computed_metrics: [0.1545180826818584]
    {'r2_score': 0.1545180826818584}

%% Cell type:markdown id: tags:

The performance of this basic random-forest model isn't very strong. To construct stronger models, let's attempt to optimize the hyperparameters (choices made in the model-specification) to achieve better performance. For random forests, we can tweak `n_estimators` which controls the number of trees in the forest, and `max_features` which controls the number of features to consider when performing a split. We now build a series of `SklearnModel`s with different choices for `n_estimators` and `max_features` and evaluate performance on the validation set.

%% Cell type:code id: tags:

``` python
def rf_model_builder(model_params, model_dir):
  sklearn_model = RandomForestRegressor(**model_params)
  return dc.models.SklearnModel(sklearn_model, model_dir)
params_dict = {
    "n_estimators": [10, 100],
    "max_features": ["auto", "sqrt", "log2", None],
}

metric = dc.metrics.Metric(dc.metrics.r2_score)
optimizer = dc.hyper.HyperparamOpt(rf_model_builder)
best_rf, best_rf_hyperparams, all_rf_results = optimizer.hyperparam_search(
    params_dict, train_dataset, valid_dataset, transformers,
    metric=metric)
```

%% Output

    Fitting model 1/8
    hyperparameters: {'n_estimators': 10, 'max_features': 'auto'}
    computed_metrics: [0.1317492982076066]
    Model 1/8, Metric r2_score, Validation set 0: 0.131749
    	best_validation_score so far: 0.131749
    Fitting model 2/8
    hyperparameters: {'n_estimators': 10, 'max_features': 'sqrt'}
    computed_metrics: [0.1829534188373444]
    Model 2/8, Metric r2_score, Validation set 1: 0.182953
    	best_validation_score so far: 0.182953
    Fitting model 3/8
    hyperparameters: {'n_estimators': 10, 'max_features': 'log2'}
    computed_metrics: [0.27258654405741267]
    Model 3/8, Metric r2_score, Validation set 2: 0.272587
    	best_validation_score so far: 0.272587
    Fitting model 4/8
    hyperparameters: {'n_estimators': 10, 'max_features': None}
    computed_metrics: [0.18944446190226294]
    Model 4/8, Metric r2_score, Validation set 3: 0.189444
    	best_validation_score so far: 0.272587
    Fitting model 5/8
    hyperparameters: {'n_estimators': 100, 'max_features': 'auto'}
    computed_metrics: [0.17469103530126828]
    Model 5/8, Metric r2_score, Validation set 4: 0.174691
    	best_validation_score so far: 0.272587
    Fitting model 6/8
    hyperparameters: {'n_estimators': 100, 'max_features': 'sqrt'}
    computed_metrics: [0.2994773398901386]
    Model 6/8, Metric r2_score, Validation set 5: 0.299477
    	best_validation_score so far: 0.299477
    Fitting model 7/8
    hyperparameters: {'n_estimators': 100, 'max_features': 'log2'}
    computed_metrics: [0.23888813598857617]
    Model 7/8, Metric r2_score, Validation set 6: 0.238888
    	best_validation_score so far: 0.299477
    Fitting model 8/8
    hyperparameters: {'n_estimators': 100, 'max_features': None}
    computed_metrics: [0.16599011759622173]
    Model 8/8, Metric r2_score, Validation set 7: 0.165990
    	best_validation_score so far: 0.299477
    computed_metrics: [0.943551418893045]
    Best hyperparameters: (100, 'sqrt')
    train_score: 0.943551
    validation_score: 0.299477

%% Cell type:markdown id: tags:

The best model achieves significantly higher $R^2$ on the validation set than the first model we constructed. Now, let's perform the same sort of hyperparameter search, but with a simple deep-network instead.

%% Cell type:code id: tags:

``` python
import numpy.random

params_dict = {"learning_rate": np.power(10., np.random.uniform(-5, -3, size=1)),
               "decay": np.power(10, np.random.uniform(-6, -4, size=1)),
               "nb_epoch": [20] }
n_features = train_dataset.get_data_shape()[0]
def model_builder(model_params, model_dir):
  model = dc.models.MultitaskRegressor(
    1, n_features, layer_sizes=[1000], dropouts=[.25],
    batch_size=50, **model_params)
  return model

optimizer = dc.hyper.HyperparamOpt(model_builder)
best_dnn, best_dnn_hyperparams, all_dnn_results = optimizer.hyperparam_search(
    params_dict, train_dataset, valid_dataset, transformers,
    metric=metric)
```

%% Output

    Fitting model 1/1
    hyperparameters: {'learning_rate': 4.758104054695717e-05, 'decay': 3.460500159272365e-05, 'nb_epoch': 20}
    WARNING:tensorflow:Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:169: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/optimizers.py:76: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:260: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
    
    WARNING:tensorflow:Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method SwitchedDropout.call of <deepchem.models.layers.SwitchedDropout object at 0x1a38086e48>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:237: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
    
    computed_metrics: [-0.03822963004964297]
    Model 1/1, Metric r2_score, Validation set 0: -0.038230
    	best_validation_score so far: -0.038230
    computed_metrics: [0.23263860966337802]
    Best hyperparameters: (4.758104054695717e-05, 3.460500159272365e-05, 20)
    train_score: 0.232639
    validation_score: -0.038230

%% Cell type:markdown id: tags:

Now that we have a reasonable choice of hyperparameters, let's evaluate the performance of our best models on the test-set.

%% Cell type:code id: tags:

``` python
rf_test_evaluator = Evaluator(best_rf, test_dataset, transformers)
rf_test_r2score = rf_test_evaluator.compute_model_performance([metric])
print("RF Test set R^2 %f" % (rf_test_r2score["r2_score"]))
```

%% Output

    computed_metrics: [0.41444873687966033]
    RF Test set R^2 0.414449

%% Cell type:code id: tags:

``` python
dnn_test_evaluator = Evaluator(best_dnn, test_dataset, transformers)
dnn_test_r2score = dnn_test_evaluator.compute_model_performance([metric])
print("DNN Test set R^2 %f" % (dnn_test_r2score["r2_score"]))
```

%% Output

    computed_metrics: [0.02077380336345558]
    DNN Test set R^2 0.020774

%% Cell type:markdown id: tags:

Now, let's plot the predicted $R^2$ scores versus the true $R^2$ scores for the constructed model.

%% Cell type:code id: tags:

``` python
task = "measured log solubility in mols per litre"
predicted_test = best_rf.predict(test_dataset)
true_test = test_dataset.y
plt.scatter(predicted_test, true_test)
plt.xlabel('Predicted log-solubility in mols/liter')
plt.ylabel('True log-solubility in mols/liter')
plt.title(r'RF- predicted vs. true log-solubilities')
plt.show()
```

%% Output


%% Cell type:code id: tags:

``` python
task = "measured log solubility in mols per litre"
predicted_test = best_dnn.predict(test_dataset)
true_test = test_dataset.y
plt.scatter(predicted_test, true_test)
plt.xlabel('Predicted log-solubility in mols/liter')
plt.ylabel('True log-solubility in mols/liter')
plt.title(r'DNN predicted vs. true log-solubilities')
plt.show()
```

%% Output


%% Cell type:markdown id: tags:

[1] John S. Delaney. ESOL: Estimating aqueous solubility directly from molecular structure. Journal
of Chemical Information and Computer Sciences, 44(3):1000–1005, 2004.

[2] Anderson, Eric, Gilman D. Veith, and David Weininger. SMILES, a line notation and computerized
interpreter for chemical structures. US Environmental Protection Agency, Environmental Research Laboratory, 1987.

[3] Rogers, David, and Mathew Hahn. "Extended-connectivity fingerprints." Journal of chemical information
and modeling 50.5 (2010): 742-754.

[4] Van Der Walt, Stefan, S. Chris Colbert, and Gael Varoquaux.
"The NumPy array:a structure for efficient numerical computation." Computing in Science & Engineering 13.2 (2011): 22-30.

[5] Bemis, Guy W., and Mark A. Murcko. "The properties of known drugs. 1. Molecular frameworks."
Journal of medicinal chemistry 39.15 (1996): 2887-2893.

[6] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." The Journal of Machine Learning Research 12 (2011): 2825-2830.
+1 −1
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Tutorial Part 3: Introduction to Graph Convolutions
# Tutorial Part 4: Introduction to Graph Convolutions

In the previous sections of the tutorial, we learned about `Dataset` and `Model` objects. We learned how to load some data into DeepChem from files on disk and also learned some basic facts about molecular data handling. We then dove into some basic deep learning architectures. However, until now, we stuck with vanilla deep learning architectures and didn't really consider how to handle deep architectures specifically engineered to work with life science data.

In this tutorial, we'll change that by going a little deeper and learn about "graph convolutions." These are one of the most powerful deep learning tools for working with molecular data. The reason for this is that molecules can be naturally viewed as graphs.

![Molecular Graph](basic_graphs.gif)

Note how standard chemical diagrams of the sort we're used to from high school lend themselves naturally to visualizing molecules as graphs. In the remainder of this tutorial, we'll dig into this relationship in significantly more detail. This will let us get an in-the guts understanding of how these systems work.

## Colab

This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/03_Introduction_to_Graph_Convolutionsipynb)

## Setup

To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

%% Cell type:code id: tags:

``` python
!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
```

%% Cell type:markdown id: tags:

Ok now that we have our environment installed, we can actually import the core `GraphConvModel` that we'll use through this tutorial.

%% Cell type:code id: tags:

``` python
import deepchem as dc
from deepchem.models.graph_models import GraphConvModel
```

%% Output

    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
      warnings.warn(msg, category=FutureWarning)
    RDKit WARNING: [12:33:54] Enabling RDKit 2019.09.3 jupyter extensions
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)])

%% Cell type:markdown id: tags:

Now, let's use the MoleculeNet suite  to load the Tox21 dataset. We need to make sure to process the data in a way that graph convolutional networks can use For that, we make sure to set the featurizer option to 'GraphConv'. The MoleculeNet call will return a training set, a validation set, and a test set for us to use. The call also returns `transformers`, a list of data transformations that were applied to preprocess the dataset. (Most deep networks are quite finicky and require a set of data transformations to ensure that training proceeds stably.)

%% Cell type:code id: tags:

``` python
# Load Tox21 dataset
tox21_tasks, tox21_datasets, transformers = dc.molnet.load_tox21(featurizer='GraphConv', reload=False)
train_dataset, valid_dataset, test_dataset = tox21_datasets
```

%% Output

    Loading raw samples now.
    shard_size: 8192
    About to start loading CSV from /var/folders/st/ds45jcqj2232lvhr0y9qt5sc0000gn/T/tox21.csv.gz
    Loading shard 1 of size 8192.
    Featurizing sample 0

    RDKit WARNING: [12:34:15] WARNING: not removing hydrogen atom without neighbors

    Featurizing sample 1000
    Featurizing sample 2000
    Featurizing sample 3000
    Featurizing sample 4000
    Featurizing sample 5000
    Featurizing sample 6000
    Featurizing sample 7000
    TIMING: featurizing shard 0 took 9.963 s
    TIMING: dataset construction took 12.151 s
    Loading dataset from disk.
    TIMING: dataset construction took 2.447 s
    Loading dataset from disk.
    TIMING: dataset construction took 1.236 s
    Loading dataset from disk.
    TIMING: dataset construction took 1.171 s
    Loading dataset from disk.
    TIMING: dataset construction took 2.298 s
    Loading dataset from disk.
    TIMING: dataset construction took 0.366 s
    Loading dataset from disk.
    TIMING: dataset construction took 0.258 s
    Loading dataset from disk.

%% Cell type:markdown id: tags:

Let's now train a graph convolutional network on this dataset. DeepChem has the class `GraphConvModel` that wraps a standard graph convolutional architecture underneath the hood for user convenience. Let's instantiate an object of this class and train it on our dataset.

%% Cell type:code id: tags:

``` python
n_tasks = len(tox21_tasks)
model = GraphConvModel(n_tasks, batch_size=50, mode='classification')

num_epochs = 10
losses = []
for i in range(num_epochs):
 loss = model.fit(train_dataset, nb_epoch=1)
 print("Epoch %d loss: %f" % (i, loss))
 losses.append(loss)
```

%% Output

    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/layers.py:222: The name tf.unsorted_segment_sum is deprecated. Please use tf.math.unsorted_segment_sum instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/layers.py:224: The name tf.unsorted_segment_max is deprecated. Please use tf.math.unsorted_segment_max instead.
    
    WARNING:tensorflow:Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:169: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/optimizers.py:76: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:260: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
    
    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a3800c940>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'

    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a36133ac8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a361330b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a3252e240>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a3324ee10>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method TrimGraphOutput.call of <deepchem.models.graph_models.TrimGraphOutput object at 0x1a304bdda0>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/losses.py:108: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/losses.py:109: The name tf.losses.Reduction is deprecated. Please use tf.compat.v1.losses.Reduction instead.
    
    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:318: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where

    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

    Epoch 0 loss: 0.179272
    Epoch 1 loss: 0.179948
    Epoch 2 loss: 0.170968
    Epoch 3 loss: 0.144263
    Epoch 4 loss: 0.154905
    Epoch 5 loss: 0.161570
    Epoch 6 loss: 0.157813
    Epoch 7 loss: 0.144116
    Epoch 8 loss: 0.160063
    Epoch 9 loss: 0.144864

%% Cell type:markdown id: tags:

Let's plot these losses so we can take a look at how the loss changes over the process of training.

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plot

plot.ylabel("Loss")
plot.xlabel("Epoch")
x = range(num_epochs)
y = losses
plot.scatter(x, y)
plot.show()
```

%% Output


%% Cell type:markdown id: tags:

We see that the losses fall nicely and give us stable learning.

Let's try to evaluate the performance of the model we've trained. For this, we need to define a metric, a measure of model performance. `dc.metrics` holds a collection of metrics already. For this dataset, it is standard to use the ROC-AUC score, the area under the receiver operating characteristic curve (which measures the tradeoff between precision and recall). Luckily, the ROC-AUC score is already available in DeepChem.

To measure the performance of the model under this metric, we can use the convenience function `model.evaluate()`.

%% Cell type:code id: tags:

``` python
import numpy as np
metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)

print("Evaluating model")
train_scores = model.evaluate(train_dataset, [metric], transformers)
print("Training ROC-AUC Score: %f" % train_scores["mean-roc_auc_score"])
valid_scores = model.evaluate(valid_dataset, [metric], transformers)
print("Validation ROC-AUC Score: %f" % valid_scores["mean-roc_auc_score"])
```

%% Output

    Evaluating model
    computed_metrics: [0.8595891713475186, 0.9208810011239563, 0.9147081134144165, 0.8827564343909045, 0.7891199471022603, 0.8785310463438729, 0.8952509980966841, 0.8515397554668475, 0.8856747579741513, 0.8522724355082143, 0.9230658251868931, 0.8917149130167253]
    Training ROC-AUC Score: 0.878759
    computed_metrics: [0.8572700737149359, 0.8533399470899471, 0.8608064442725163, 0.8154550076258261, 0.6855681818181818, 0.7803477303123278, 0.7248182762201454, 0.8583574062331196, 0.8459618554273605, 0.7478649766271126, 0.8969316630338451, 0.8312230835486649]
    Validation ROC-AUC Score: 0.813162

%% Cell type:markdown id: tags:

What's going on under the hood? Could we build `GraphConvModel` ourselves? Of course! The first step is to define the inputs to our model. Conceptually, graph convolutions just require the structure of the molecule in question and a vector of features for every atom that describes the local chemical environment. However in practice, due to TensorFlow's limitations as a general programming environment, we have to have some auxiliary information as well preprocessed.

`atom_features` holds a feature vector of length 75 for each atom. The other inputs are required to support minibatching in TensorFlow. `degree_slice` is an indexing convenience that makes it easy to locate atoms from all molecules with a given degree. `membership` determines the membership of atoms in molecules (atom `i` belongs to molecule `membership[i]`). `deg_adjs` is a list that contains adjacency lists grouped by atom degree. For more details, check out the [code](https://github.com/deepchem/deepchem/blob/master/deepchem/feat/mol_graphs.py).

To define feature inputs with Keras, we use the `Input` layer. Conceptually, a model is a mathematical graph composed of layer objects. `Input` layers have to be the root nodes of the graph since they consitute inputs.

%% Cell type:code id: tags:

``` python
import tensorflow as tf
import tensorflow.keras.layers as layers

atom_features = layers.Input(shape=(75,))
degree_slice = layers.Input(shape=(2,), dtype=tf.int32)
membership = layers.Input(shape=tuple(), dtype=tf.int32)

deg_adjs = []
for i in range(0, 10 + 1):
    deg_adj = layers.Input(shape=(i+1,), dtype=tf.int32)
    deg_adjs.append(deg_adj)
```

%% Cell type:markdown id: tags:

Let's now implement the body of the graph convolutional network. DeepChem has a number of layers that encode various graph operations. Namely, the `GraphConv`, `GraphPool` and `GraphGather` layers. We will also apply standard neural network layers such as `Dense` and `BatchNormalization`.

The layers we're adding effect a "feature transformation" that will create one vector for each molecule.

%% Cell type:code id: tags:

``` python
from deepchem.models.layers import GraphConv, GraphPool, GraphGather

batch_size = 50

gc1 = GraphConv(64, activation_fn=tf.nn.relu)([atom_features, degree_slice, membership] + deg_adjs)
batch_norm1 = layers.BatchNormalization()(gc1)
gp1 = GraphPool()([batch_norm1, degree_slice, membership] + deg_adjs)
gc2 = GraphConv(64, activation_fn=tf.nn.relu)([gp1, degree_slice, membership] + deg_adjs)
batch_norm2 = layers.BatchNormalization()(gc2)
gp2 = GraphPool()([batch_norm2, degree_slice, membership] + deg_adjs)
dense = layers.Dense(128, activation=tf.nn.relu)(gp2)
batch_norm3 = layers.BatchNormalization()(dense)
readout = GraphGather(batch_size=batch_size, activation_fn=tf.nn.tanh)([batch_norm3, degree_slice, membership] + deg_adjs)
logits = layers.Reshape((n_tasks, 2))(layers.Dense(n_tasks*2)(readout))
softmax = layers.Softmax()(logits)
```

%% Output

    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'

%% Cell type:markdown id: tags:

Let's now create the `KerasModel`. To do that we specify the inputs and outputs to the model. We also have to define a loss for the model which tells the network the objective to minimize during training.

%% Cell type:code id: tags:

``` python
inputs = [atom_features, degree_slice, membership] + deg_adjs
outputs = [softmax]
keras_model = tf.keras.Model(inputs=inputs, outputs=outputs)
loss = dc.models.losses.CategoricalCrossEntropy()
model = dc.models.KerasModel(keras_model, loss=loss)
```

%% Cell type:markdown id: tags:

Now that we've successfully defined our graph convolutional model, we need to train it. We can call `fit()`, but we need to make sure that each minibatch of data populates all the `Input` objects that we've created. For this, we need to create a Python generator that given a batch of data generates the lists of inputs, labels, and weights whose values are Numpy arrays we'd like to use for this step of training.

%% Cell type:code id: tags:

``` python
from deepchem.metrics import to_one_hot
from deepchem.feat.mol_graphs import ConvMol

def data_generator(dataset, epochs=1, predict=False, pad_batches=True):
  for epoch in range(epochs):
    for ind, (X_b, y_b, w_b, ids_b) in enumerate(
        dataset.iterbatches(
            batch_size, pad_batches=pad_batches, deterministic=True)):
      multiConvMol = ConvMol.agglomerate_mols(X_b)
      inputs = [multiConvMol.get_atom_features(), multiConvMol.deg_slice, np.array(multiConvMol.membership)]
      for i in range(1, len(multiConvMol.get_deg_adjacency_lists())):
        inputs.append(multiConvMol.get_deg_adjacency_lists()[i])
      labels = [to_one_hot(y_b.flatten(), 2).reshape(-1, n_tasks, 2)]
      weights = [w_b]
      yield (inputs, labels, weights)
```

%% Cell type:markdown id: tags:

Now, we can train the model using `KerasModel.fit_generator(generator)` which will use the generator we've defined to train the model.

%% Cell type:code id: tags:

``` python
num_epochs = 10
losses = []
for i in range(num_epochs):
  loss = model.fit_generator(data_generator(train_dataset, epochs=1))
  print("Epoch %d loss: %f" % (i, loss))
  losses.append(loss)
```

%% Output

    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a321f1dd8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a33d597b8>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphConv.call of <deepchem.models.layers.GraphConv object at 0x1a31306668>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphPool.call of <deepchem.models.layers.GraphPool object at 0x1a39232160>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING:tensorflow:Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'
    WARNING: Entity <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method GraphGather.call of <deepchem.models.layers.GraphGather object at 0x1a390c1518>>: AttributeError: module 'gast' has no attribute 'Num'

    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

    Epoch 0 loss: 0.187656
    Epoch 1 loss: 0.176989
    Epoch 2 loss: 0.172129
    Epoch 3 loss: 0.137272
    Epoch 4 loss: 0.159109
    Epoch 5 loss: 0.157422
    Epoch 6 loss: 0.153595
    Epoch 7 loss: 0.144544
    Epoch 8 loss: 0.146739
    Epoch 9 loss: 0.143846

%% Cell type:markdown id: tags:

Let's now plot these losses and take a quick look.

%% Cell type:code id: tags:

``` python
plot.title("Keras Version")
plot.ylabel("Loss")
plot.xlabel("Epoch")
x = range(num_epochs)
y = losses
plot.scatter(x, y)
plot.show()
```

%% Output


%% Cell type:markdown id: tags:

Now that we have trained our graph convolutional method, let's evaluate its performance. We again have to use our defined generator to evaluate model performance.

%% Cell type:code id: tags:

``` python
metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)

def reshape_y_pred(y_true, y_pred):
    """
    GraphConv always pads batches, so we need to remove the predictions
    for the padding samples.  Also, it outputs two values for each task
    (probabilities of positive and negative), but we only want the positive
    probability.
    """
    n_samples = len(y_true)
    return y_pred[:n_samples, :, 1]


print("Evaluating model")
train_predictions = model.predict_on_generator(data_generator(train_dataset, predict=True))
train_predictions = reshape_y_pred(train_dataset.y, train_predictions)
train_scores = metric.compute_metric(train_dataset.y, train_predictions, train_dataset.w)
print("Training ROC-AUC Score: %f" % train_scores)

valid_predictions = model.predict_on_generator(data_generator(valid_dataset, predict=True))
valid_predictions = reshape_y_pred(valid_dataset.y, valid_predictions)
valid_scores = metric.compute_metric(valid_dataset.y, valid_predictions, valid_dataset.w)
print("Valid ROC-AUC Score: %f" % valid_scores)
```

%% Output

    Evaluating model
    computed_metrics: [0.8597628498713714]
    Training ROC-AUC Score: 0.859763
    computed_metrics: [0.7793962313756978]
    Valid ROC-AUC Score: 0.779396

%% Cell type:markdown id: tags:

Success! The model we've constructed behaves nearly identically to `GraphConvModel`. If you're looking to build your own custom models, you can follow the example we've provided here to do so. We hope to see exciting constructions from your end soon!

%% Cell type:markdown id: tags:

# Congratulations! Time to join the Community!

Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

## Join the DeepChem Gitter
The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!
Loading