Unverified Commit e0185ee5 authored by Bharath Ramsundar's avatar Bharath Ramsundar Committed by GitHub
Browse files

Merge pull request #1789 from nd-02110114/fix-tutorial

Refactor Tutorial 
parents 93f053f2 b7f5dbe9
Loading
Loading
Loading
Loading
+1539 −1206

File changed.

Preview size limit exceeded, changes collapsed.

+80 −23
Original line number Original line Diff line number Diff line
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:


# Tutorial Part 2: Learning MNIST Digit Classifiers
# Tutorial Part 2: Learning MNIST Digit Classifiers


In the previous tutorial, we learned some basics of how to load data into DeepChem and how to use the basic DeepChem objects to load and manipulate this data. In this tutorial, you'll put the parts together and learn how to train a basic image classification model in DeepChem. You might ask, why are we bothering to learn this material in DeepChem? Part of the reason is that image processing is an increasingly important part of AI for the life sciences. So learning how to train image processing models will be very useful for using some of the more advanced DeepChem features.
In the previous tutorial, we learned some basics of how to load data into DeepChem and how to use the basic DeepChem objects to load and manipulate this data. In this tutorial, you'll put the parts together and learn how to train a basic image classification model in DeepChem. You might ask, why are we bothering to learn this material in DeepChem? Part of the reason is that image processing is an increasingly important part of AI for the life sciences. So learning how to train image processing models will be very useful for using some of the more advanced DeepChem features.


The MNIST dataset contains handwritten digits along with their human annotated labels. The learning challenge for this dataset is to train a model that maps the digit image to its true label. MNIST has been a standard benchmark for machine learning for decades at this point.
The MNIST dataset contains handwritten digits along with their human annotated labels. The learning challenge for this dataset is to train a model that maps the digit image to its true label. MNIST has been a standard benchmark for machine learning for decades at this point.


![MNIST](mnist_examples.png)
![MNIST](https://github.com/deepchem/deepchem/blob/master/examples/tutorials/mnist_examples.png?raw=1)


## Colab
## Colab


This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.
This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/02_Learning_MNIST_Digit_Classifiers.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/02_Learning_MNIST_Digit_Classifiers.ipynb)


## Setup
## Setup


We recommend running this tutorial on Google colab. You'll need to run the following cell of installation commands on Colab to get your environment set up. If you'd rather run the tutorial locally, make sure you don't run these commands (since they'll download and install a new Anaconda python setup)
We recommend running this tutorial on Google colab. You'll need to run the following cell of installation commands on Colab to get your environment set up. If you'd rather run the tutorial locally, make sure you don't run these commands (since they'll download and install a new Anaconda python setup)


%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
!wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
%%capture
!chmod +x Anaconda3-2019.10-Linux-x86_64.sh
%tensorflow_version 1.x
!bash ./Anaconda3-2019.10-Linux-x86_64.sh -b -f -p /usr/local
!wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
!conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0
import sys
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
sys.path.append('/usr/local/lib/python3.7/site-packages/')
import deepchem as dc
```
```


%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.examples.tutorials.mnist import input_data
```
```


%% Output


%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
# TODO: This is deprecated. Let's replace with a DeepChem native loader for maintainability.
# TODO: This is deprecated. Let's replace with a DeepChem native loader for maintainability.
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
```
```


%% Output
%% Output


    WARNING:tensorflow:From <ipython-input-3-a839aeb82f4b>:1: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please write your own downloading logic.
    WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use urllib or similar directly.
    Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
    WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    Extracting MNIST_data/train-images-idx3-ubyte.gz
    Extracting MNIST_data/train-images-idx3-ubyte.gz
    Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
    WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    Extracting MNIST_data/train-labels-idx1-ubyte.gz
    Extracting MNIST_data/train-labels-idx1-ubyte.gz
    WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.one_hot on tensors.
    Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
    Extracting MNIST_data/t10k-images-idx3-ubyte.gz
    Extracting MNIST_data/t10k-images-idx3-ubyte.gz
    Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
    Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
    Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
    WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
import deepchem as dc
import deepchem as dc
import tensorflow as tf
import tensorflow as tf
from tensorflow.keras.layers import Reshape, Conv2D, Flatten, Dense, Softmax
from tensorflow.keras.layers import Reshape, Conv2D, Flatten, Dense, Softmax
```
```


%% Output

    /usr/local/lib/python3.6/dist-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
      warnings.warn(msg, category=FutureWarning)

    WARNING:tensorflow:
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    

%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
train = dc.data.NumpyDataset(mnist.train.images, mnist.train.labels)
train = dc.data.NumpyDataset(mnist.train.images, mnist.train.labels)
valid = dc.data.NumpyDataset(mnist.validation.images, mnist.validation.labels)
valid = dc.data.NumpyDataset(mnist.validation.images, mnist.validation.labels)
```
```


%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
keras_model = tf.keras.Sequential([
keras_model = tf.keras.Sequential([
    Reshape((28, 28, 1)),
    Reshape((28, 28, 1)),
    Conv2D(filters=32, kernel_size=5, activation=tf.nn.relu),
    Conv2D(filters=32, kernel_size=5, activation=tf.nn.relu),
    Conv2D(filters=64, kernel_size=5, activation=tf.nn.relu),
    Conv2D(filters=64, kernel_size=5, activation=tf.nn.relu),
    Flatten(),
    Flatten(),
    Dense(1024, activation=tf.nn.relu),
    Dense(1024, activation=tf.nn.relu),
    Dense(10),
    Dense(10),
    Softmax()
    Softmax()
])
])
model = dc.models.KerasModel(keras_model, dc.models.losses.CategoricalCrossEntropy())
model = dc.models.KerasModel(keras_model, dc.models.losses.CategoricalCrossEntropy())
```
```


%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
model.fit(train, nb_epoch=2)
model.fit(train, nb_epoch=2)
```
```


%% Output
%% Output


    WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/deepchem/models/keras_model.py:169: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
    
    WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/deepchem/models/optimizers.py:76: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
    
    WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/deepchem/models/keras_model.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
    
    WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/deepchem/models/keras_model.py:260: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
    
    WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/deepchem/models/keras_model.py:200: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
    
    WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
    Instructions for updating:
    If using Keras pass *_constraint arguments to layers.

    0.0
    0.0


%% Cell type:code id: tags:
%% Cell type:code id: tags:


``` python
``` 
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import roc_curve, auc
import numpy as np
import numpy as np


print("Validation")
print("Validation")
prediction = np.squeeze(model.predict_on_batch(valid.X))
prediction = np.squeeze(model.predict_on_batch(valid.X))


fpr = dict()
fpr = dict()
tpr = dict()
tpr = dict()
roc_auc = dict()
roc_auc = dict()
for i in range(10):
for i in range(10):
    fpr[i], tpr[i], thresh = roc_curve(valid.y[:, i], prediction[:, i])
    fpr[i], tpr[i], thresh = roc_curve(valid.y[:, i], prediction[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])
    roc_auc[i] = auc(fpr[i], tpr[i])
    print("class %s:auc=%s" % (i, roc_auc[i]))
    print("class %s:auc=%s" % (i, roc_auc[i]))
```
```


%% Output
%% Output


    Validation
    Validation
    class 0:auc=0.9998836328172079
    class 0:auc=0.9999057979948827
    class 1:auc=0.9999571662641497
    class 1:auc=0.9999335476621387
    class 2:auc=0.9998310516219043
    class 2:auc=0.9998705637425881
    class 3:auc=0.9999563446718672
    class 3:auc=0.999911789233876
    class 4:auc=0.9999418111793702
    class 4:auc=0.9999623237852037
    class 5:auc=0.9995639983771051
    class 5:auc=0.9998804023326087
    class 6:auc=0.9998478260194437
    class 6:auc=0.9998620230088834
    class 7:auc=0.9998357507660879
    class 7:auc=0.9995460674157303
    class 8:auc=0.9999599342922393
    class 8:auc=0.9998530924048773
    class 9:auc=0.9998551553268534
    class 9:auc=0.9996017892577271


%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:


# Congratulations! Time to join the Community!
# Congratulations! Time to join the Community!


Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:
Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:


## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.
This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.


## Join the DeepChem Gitter
## Join the DeepChem Gitter
The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!
The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!
+169 −508

File changed.

Preview size limit exceeded, changes collapsed.

+146 −867

File changed.

Preview size limit exceeded, changes collapsed.

+36 −828

File changed.

Preview size limit exceeded, changes collapsed.

Loading