Unverified Commit f1faf0af authored by Bharath Ramsundar's avatar Bharath Ramsundar Committed by GitHub
Browse files

Merge pull request #1743 from deepchem/numpy

Cleanup NumpyDataset, MNIST, Uncertainty Notebooks
parents 85af7c56 b267156d
Loading
Loading
Loading
Loading
+145 −91

File changed.

Preview size limit exceeded, changes collapsed.

+196 −0
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Train a model on MNIST

%% Cell type:markdown id: tags:

In this tutorial, we'll walk you through a machine-learning 101 example of training a simple convolutional model on the MNIST dataset. We'll use Keras to build a model, but will use DeepChem's dataset module to make dataset handling convenient. This tutorial should also provide a good example of how to mix and match TensorFlow tooling with DeepChem. You'll definitely want to do some of this in your models.

To start, we'll pull in the raw data for MNIST from one of the TensorFlow tutorials.

%% Cell type:code id: tags:

``` python
from tensorflow.examples.tutorials.mnist import input_data
```

%% Output

    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)])

%% Cell type:code id: tags:

``` python
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
```

%% Output

    WARNING:tensorflow:From <ipython-input-2-a839aeb82f4b>:1: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please write your own downloading logic.
    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use urllib or similar directly.
    Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    Extracting MNIST_data/train-images-idx3-ubyte.gz
    Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    Extracting MNIST_data/train-labels-idx1-ubyte.gz
    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.one_hot on tensors.
    Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
    Extracting MNIST_data/t10k-images-idx3-ubyte.gz
    Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
    Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.

%% Cell type:code id: tags:

``` python
import deepchem as dc
import tensorflow as tf
from tensorflow.keras.layers import Reshape, Conv2D, Flatten, Dense, Softmax
```

%% Output

    /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
      warnings.warn(msg, category=FutureWarning)
    RDKit WARNING: [17:25:52] Enabling RDKit 2019.09.3 jupyter extensions

%% Cell type:markdown id: tags:

Let's wrap these datasets with DeepChem's `NumpyDataset`.

%% Cell type:code id: tags:

``` python
train = dc.data.NumpyDataset(mnist.train.images, mnist.train.labels)
valid = dc.data.NumpyDataset(mnist.validation.images, mnist.validation.labels)
```

%% Cell type:markdown id: tags:

We're going to train a simple convolutional neural network with 2 convolutional layers and two dense layers on this data. We'll use the `tf.keras.Sequential` class to simplify the construction of a model of this type. We're going to wrap this model in `dc.models.KerasModel` which will enable this model to train on DeepChem dataset objects.

%% Cell type:code id: tags:

``` python
keras_model = tf.keras.Sequential([
    Reshape((28, 28, 1)),
    Conv2D(filters=32, kernel_size=5, activation=tf.nn.relu),
    Conv2D(filters=64, kernel_size=5, activation=tf.nn.relu),
    Flatten(),
    Dense(1024, activation=tf.nn.relu),
    Dense(10),
    Softmax()
])
model = dc.models.KerasModel(keras_model, dc.models.losses.CategoricalCrossEntropy())
```

%% Output

    WARNING:tensorflow:From /Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor

%% Cell type:markdown id: tags:

Let's now fit our simple model on the data! We'll run for just 2 epochs so the training doesn't take too long.

%% Cell type:code id: tags:

``` python
model.fit(train, nb_epoch=2)
```

%% Output

    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:169: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/optimizers.py:76: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:260: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
    
    WARNING:tensorflow:From /Users/bharath/Code/deepchem/deepchem/models/keras_model.py:200: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
    

    0.0

%% Cell type:markdown id: tags:

Let's now try evaluating this model on our validation set. We'll print out the AUC scores for our model.

%% Cell type:code id: tags:

``` python
from sklearn.metrics import roc_curve, auc
import numpy as np

print("Validation")
prediction = np.squeeze(model.predict_on_batch(valid.X))

fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(10):
    fpr[i], tpr[i], thresh = roc_curve(valid.y[:, i], prediction[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])
    print("class %s:auc=%s" % (i, roc_auc[i]))
```

%% Output

    Validation
    class 0:auc=0.9999676757825577
    class 1:auc=0.9999515618501131
    class 2:auc=0.9999504963085688
    class 3:auc=0.9999450932986371
    class 4:auc=0.9999581375391153
    class 5:auc=0.9999444905341222
    class 6:auc=0.9999179236548019
    class 7:auc=0.9997867211440246
    class 8:auc=0.9998344904691312
    class 9:auc=0.9995919236762745

%% Cell type:code id: tags:

``` python
```
+13 −7

File changed.

Preview size limit exceeded, changes collapsed.

examples/notebooks/mnist.ipynb

deleted100644 → 0
+0 −99
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# MNIST with DeepChem and TensorGraph

%% Cell type:code id: tags:

``` python
from tensorflow.examples.tutorials.mnist import input_data
```

%% Cell type:code id: tags:

``` python
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
```

%% Output

    Extracting MNIST_data/train-images-idx3-ubyte.gz
    Extracting MNIST_data/train-labels-idx1-ubyte.gz
    Extracting MNIST_data/t10k-images-idx3-ubyte.gz
    Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

%% Cell type:code id: tags:

``` python
import deepchem as dc
import tensorflow as tf
from tensorflow.keras.layers import Reshape, Conv2D, Flatten, Dense, Softmax
```

%% Cell type:code id: tags:

``` python
train = dc.data.NumpyDataset(mnist.train.images, mnist.train.labels)
valid = dc.data.NumpyDataset(mnist.validation.images, mnist.validation.labels)
```

%% Cell type:code id: tags:

``` python
keras_model = tf.keras.Sequential([
    Reshape((28, 28, 1)),
    Conv2D(filters=32, kernel_size=5, activation=tf.nn.relu),
    Conv2D(filters=64, kernel_size=5, activation=tf.nn.relu),
    Flatten(),
    Dense(1024, activation=tf.nn.relu),
    Dense(10),
    Softmax()
])
model = dc.models.KerasModel(keras_model, dc.models.losses.CategoricalCrossEntropy())
```

%% Cell type:code id: tags:

``` python
model.fit(train, nb_epoch=2)
```

%% Output

    0.0

%% Cell type:code id: tags:

``` python
from sklearn.metrics import roc_curve, auc
import numpy as np

print("Validation")
prediction = np.squeeze(model.predict_on_batch(valid.X))

fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(10):
    fpr[i], tpr[i], thresh = roc_curve(valid.y[:, i], prediction[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])
    print("class %s:auc=%s" % (i, roc_auc[i]))
```

%% Output

    Validation
    class 0:auc=0.9999515136738367
    class 1:auc=0.999956365633573
    class 2:auc=0.9998060726950355
    class 3:auc=0.9999626454408761
    class 4:auc=0.9999271593180605
    class 5:auc=0.9999116894861034
    class 6:auc=0.9999347825797615
    class 7:auc=0.9998377936670072
    class 8:auc=0.9999093751848269
    class 9:auc=0.9996134485812621

%% Cell type:code id: tags:

``` python
```