Merge pull request #2262 from peastman/tutorials (6c7d02e8) · Commits · 钟慕尧 / deepchem

examples/tutorials/11_Learning_Unsupervised_Embeddings_for_Molecules.ipynb

deleted100644 → 0

+0 −207

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial Part 11: Learning Unsupervised Embeddings for Molecules


		In this example, we will use a `SeqToSeq` model to generate fingerprints for classifying molecules. This is based on the following paper, although some of the implementation details are different: Xu et al., "Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery" (https://doi.org/10.1145/3107411.3107424).

		Many types of models require their inputs to have a fixed shape. Since molecules can vary widely in the numbers of atoms and bonds they contain, this makes it hard to apply those models to them. We need a way of generating a fixed length "fingerprint" for each molecule. Various ways of doing this have been designed, such as Extended-Connectivity Fingerprints (ECFPs). But in this example, instead of designing a fingerprint by hand, we will let a `SeqToSeq` model learn its own method of creating fingerprints.

		A `SeqToSeq` model performs sequence to sequence translation. For example, they are often used to translate text from one language to another. It consists of two parts called the "encoder" and "decoder". The encoder is a stack of recurrent layers. The input sequence is fed into it, one token at a time, and it generates a fixed length vector called the "embedding vector". The decoder is another stack of recurrent layers that performs the inverse operation: it takes the embedding vector as input, and generates the output sequence. By training it on appropriately chosen input/output pairs, you can create a model that performs many sorts of transformations.

		In this case, we will use SMILES strings describing molecules as the input sequences. We will train the model as an autoencoder, so it tries to make the output sequences identical to the input sequences. For that to work, the encoder must create embedding vectors that contain all information from the original sequence. That's exactly what we want in a fingerprint, so perhaps those embedding vectors will then be useful as a way to represent molecules in other models!


		## Colab

		This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/11_Learning_Unsupervised_Embeddings_for_Molecules.ipynb)

		## Setup

		To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment. This notebook will take a few hours to run on a GPU machine, so we encourage you to run it on Google colab unless you have a good GPU machine available.

		%% Cell type:code id: tags:

		``` python
		!curl -Lo conda_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py
		import conda_installer
		conda_installer.install()
		!/root/miniconda/bin/conda info -e
		```

		%% Output

		% Total % Received % Xferd Average Speed Time Time Time Current
		Dload Upload Total Spent Left Speed
		0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 3489 100 3489 0 0 8209 0 --:--:-- --:--:-- --:--:-- 8209

		add /root/miniconda/lib/python3.6/site-packages to PYTHONPATH
		all packages is already installed

		# conda environments:
		#
		base * /root/miniconda


		%% Cell type:code id: tags:

		``` python
		!pip install --pre deepchem
		import deepchem
		deepchem.__version__
		```

		%% Output

		Requirement already satisfied: deepchem in /usr/local/lib/python3.6/dist-packages (2.4.0rc1.dev20200805143219)
		Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from deepchem) (0.22.2.post1)
		Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.0.5)
		Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from deepchem) (0.16.0)
		Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.18.5)
		Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.4.1)
		Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->deepchem) (2018.9)
		Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas->deepchem) (2.8.1)
		Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.6.1->pandas->deepchem) (1.15.0)

		'2.4.0-rc1.dev'

		%% Cell type:markdown id: tags:

		Let's start by loading the data. We will use the MUV dataset. It includes 74,501 molecules in the training set, and 9313 molecules in the validation set, so it gives us plenty of SMILES strings to work with.

		%% Cell type:code id: tags:

		``` python
		# import deepchem as dc
		# tasks, datasets, transformers = dc.molnet.load_muv()
		# train_dataset, valid_dataset, test_dataset = datasets
		# train_smiles = train_dataset.ids
		# valid_smiles = valid_dataset.ids
		```

		%% Cell type:markdown id: tags:

		We need to define the "alphabet" for our `SeqToSeq` model, the list of all tokens that can appear in sequences. (It's also possible for input and output sequences to have different alphabets, but since we're training it as an autoencoder, they're identical in this case.) Make a list of every character that appears in any training sequence.

		%% Cell type:code id: tags:

		``` python
		# tokens = set()
		# for s in train_smiles:
		# tokens = tokens.union(set(c for c in s))
		# tokens = sorted(list(tokens))
		```

		%% Cell type:markdown id: tags:

		Create the model and define the optimization method to use. In this case, learning works much better if we gradually decrease the learning rate. We use an `ExponentialDecay` to multiply the learning rate by 0.9 after each epoch.

		%% Cell type:code id: tags:

		``` python
		# from deepchem.models.optimizers import Adam, ExponentialDecay
		# max_length = max(len(s) for s in train_smiles)
		# batch_size = 100
		# batches_per_epoch = len(train_smiles)/batch_size
		# model = dc.models.SeqToSeq(tokens,
		# tokens,
		# max_length,
		# encoder_layers=2,
		# decoder_layers=2,
		# embedding_dimension=256,
		# model_dir='fingerprint',
		# batch_size=batch_size,
		# learning_rate=ExponentialDecay(0.004, 0.9, batches_per_epoch))
		```

		%% Cell type:markdown id: tags:

		Let's train it! The input to `fit_sequences()` is a generator that produces input/output pairs. On a good GPU, this should take a few hours or less.

		%% Cell type:code id: tags:

		``` python
		# def generate_sequences(epochs):
		# for i in range(epochs):
		# for s in train_smiles:
		# yield (s, s)

		# model.fit_sequences(generate_sequences(40))
		```

		%% Cell type:markdown id: tags:

		Let's see how well it works as an autoencoder. We'll run the first 500 molecules from the validation set through it, and see how many of them are exactly reproduced.

		%% Cell type:code id: tags:

		``` python
		# predicted = model.predict_from_sequences(valid_smiles[:500])
		# count = 0
		# for s,p in zip(valid_smiles[:500], predicted):
		# if ''.join(p) == s:
		# count += 1
		# print('reproduced', count, 'of 500 validation SMILES strings')
		```

		%% Cell type:markdown id: tags:

		Now we'll trying using the encoder as a way to generate molecular fingerprints. We compute the embedding vectors for all molecules in the training and validation datasets, and create new datasets that have those as their feature vectors. The amount of data is small enough that we can just store everything in memory.

		%% Cell type:code id: tags:

		``` python
		# train_embeddings = model.predict_embeddings(train_smiles)
		# train_embeddings_dataset = dc.data.NumpyDataset(train_embeddings,
		# train_dataset.y,
		# train_dataset.w,
		# train_dataset.ids)

		# valid_embeddings = model.predict_embeddings(valid_smiles)
		# valid_embeddings_dataset = dc.data.NumpyDataset(valid_embeddings,
		# valid_dataset.y,
		# valid_dataset.w,
		# valid_dataset.ids)
		```

		%% Cell type:markdown id: tags:

		For classification, we'll use a simple fully connected network with one hidden layer.

		%% Cell type:code id: tags:

		``` python
		# classifier = dc.models.MultitaskClassifier(n_tasks=len(tasks),
		# n_features=256,
		# layer_sizes=[512])
		# classifier.fit(train_embeddings_dataset, nb_epoch=10)
		```

		%% Cell type:markdown id: tags:

		Find out how well it worked. Compute the ROC AUC for the training and validation datasets.

		%% Cell type:code id: tags:

		``` python
		# import numpy as np
		# metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean, mode="classification")
		# train_score = classifier.evaluate(train_embeddings_dataset, [metric], transformers)
		# valid_score = classifier.evaluate(valid_embeddings_dataset, [metric], transformers)
		# print('Training set ROC AUC:', train_score)
		# print('Validation set ROC AUC:', valid_score)
		```

		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
		This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

examples/tutorials/16_Conditional_Generative_Adversarial_Networks.ipynb→examples/tutorials/14_Conditional_Generative_Adversarial_Networks.ipynb

+4 −4

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial Part 16: Conditional Generative Adversarial Network
		# Tutorial Part 14: Conditional Generative Adversarial Network

		A Generative Adversarial Network (GAN) is a type of generative model. It consists of two parts called the "generator" and the "discriminator". The generator takes random values as input and transforms them into an output that (hopefully) resembles the training data. The discriminator takes a set of samples as input and tries to distinguish the real training samples from the ones created by the generator. Both of them are trained together. The discriminator tries to get better and better at telling real from false data, while the generator tries to get better and better at fooling the discriminator.

		A Conditional GAN (CGAN) allows additional inputs to the generator and discriminator that their output is conditioned on. For example, this might be a class label, and the GAN tries to learn how the data distribution varies between classes.

		## Colab

		This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/16_Conditional_Generative_Adversarial_Networks.ipynb)
		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/14_Conditional_Generative_Adversarial_Networks.ipynb)

		## Setup

		To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

		%% Cell type:code id: tags:

		``` python
		!curl -Lo conda_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py
		import conda_installer
		conda_installer.install()
		!/root/miniconda/bin/conda info -e
		```

		%% Cell type:code id: tags:

		``` python
		!pip install --pre deepchem
		import deepchem
		deepchem.__version__
		```

		%% Cell type:markdown id: tags:

		For this example, we will create a data distribution consisting of a set of ellipses in 2D, each with a random position, shape, and orientation. Each class corresponds to a different ellipse. Let's randomly generate the ellipses. For each one we select a random center position, X and Y size, and rotation angle. We then create a transformation matrix that maps the unit circle to the ellipse.

		%% Cell type:code id: tags:

		``` python
		import deepchem as dc
		import numpy as np
		import tensorflow as tf

		n_classes = 4
		class_centers = np.random.uniform(-4, 4, (n_classes, 2))
		class_transforms = []
		for i in range(n_classes):
		xscale = np.random.uniform(0.5, 2)
		yscale = np.random.uniform(0.5, 2)
		angle = np.random.uniform(0, np.pi)
		m = [[xscalenp.cos(angle), -yscalenp.sin(angle)],
		[xscalenp.sin(angle), yscalenp.cos(angle)]]
		class_transforms.append(m)
		class_transforms = np.array(class_transforms)
		```

		%% Cell type:markdown id: tags:

		This function generates random data from the distribution. For each point it chooses a random class, then a random position in that class' ellipse.

		%% Cell type:code id: tags:

		``` python
		def generate_data(n_points):
		classes = np.random.randint(n_classes, size=n_points)
		r = np.random.random(n_points)
		angle = 2np.pinp.random.random(n_points)
		points = (r*np.array([np.cos(angle), np.sin(angle)])).T
		points = np.einsum('ijk,ik->ij', class_transforms[classes], points)
		points += class_centers[classes]
		return classes, points
		```

		%% Cell type:markdown id: tags:

		Let's plot a bunch of random points drawn from this distribution to see what it looks like. Points are colored based on their class label.

		%% Cell type:code id: tags:

		``` python
		%matplotlib inline
		import matplotlib.pyplot as plot
		classes, points = generate_data(1000)
		plot.scatter(x=points[:,0], y=points[:,1], c=classes)
		```

		%% Output

		<matplotlib.collections.PathCollection at 0x1584692d0>



		%% Cell type:markdown id: tags:

		Now let's create the model for our CGAN. DeepChem's GAN class makes this very easy. We just subclass it and implement a few methods. The two most important are:

		- `create_generator()` constructs a model implementing the generator. The model takes as input a batch of random noise plus any condition variables (in our case, the one-hot encoded class of each sample). Its output is a synthetic sample that is supposed to resemble the training data.

		- `create_discriminator()` constructs a model implementing the discriminator. The model takes as input the samples to evaluate (which might be either real training data or synthetic samples created by the generator) and the condition variables. Its output is a single number for each sample, which will be interpreted as the probability that the sample is real training data.

		In this case, we use very simple models. They just concatenate the inputs together and pass them through a few dense layers. Notice that the final layer of the discriminator uses a sigmoid activation. This ensures it produces an output between 0 and 1 that can be interpreted as a probability.

		We also need to implement a few methods that define the shapes of the various inputs. We specify that the random noise provided to the generator should consist of ten numbers for each sample; that each data sample consists of two numbers (the X and Y coordinates of a point in 2D); and that the conditional input consists of `n_classes` for each sample (the one-hot encoded class index).
		We also need to implement a few methods that define the shapes of the various inputs. We specify that the random noise provided to the generator should consist of ten numbers for each sample; that each data sample consists of two numbers (the X and Y coordinates of a point in 2D); and that the conditional input consists of `n_classes` numbers for each sample (the one-hot encoded class index).

		%% Cell type:code id: tags:

		``` python
		from tensorflow.keras.layers import Concatenate, Dense, Input

		class ExampleGAN(dc.models.GAN):

		def get_noise_input_shape(self):
		return (10,)

		def get_data_input_shapes(self):
		return [(2,)]

		def get_conditional_input_shapes(self):
		return [(n_classes,)]

		def create_generator(self):
		noise_in = Input(shape=(10,))
		conditional_in = Input(shape=(n_classes,))
		gen_in = Concatenate()([noise_in, conditional_in])
		gen_dense1 = Dense(30, activation=tf.nn.relu)(gen_in)
		gen_dense2 = Dense(30, activation=tf.nn.relu)(gen_dense1)
		generator_points = Dense(2)(gen_dense2)
		return tf.keras.Model(inputs=[noise_in, conditional_in], outputs=[generator_points])

		def create_discriminator(self):
		data_in = Input(shape=(2,))
		conditional_in = Input(shape=(n_classes,))
		discrim_in = Concatenate()([data_in, conditional_in])
		discrim_dense1 = Dense(30, activation=tf.nn.relu)(discrim_in)
		discrim_dense2 = Dense(30, activation=tf.nn.relu)(discrim_dense1)
		discrim_prob = Dense(1, activation=tf.sigmoid)(discrim_dense2)
		return tf.keras.Model(inputs=[data_in, conditional_in], outputs=[discrim_prob])

		gan = ExampleGAN(learning_rate=1e-4)
		```

		%% Cell type:markdown id: tags:

		Now to fit the model. We do this by calling `fit_gan()`. The argument is an iterator that produces batches of training data. More specifically, it needs to produces dicts that map all data inputs and conditional inputs to the values to use for them. In our case we can easily create as much random data as we need, so we define a generator that calls the `generate_data()` function defined above for each new batch.
		Now to fit the model. We do this by calling `fit_gan()`. The argument is an iterator that produces batches of training data. More specifically, it needs to produce dicts that map all data inputs and conditional inputs to the values to use for them. In our case we can easily create as much random data as we need, so we define a generator that calls the `generate_data()` function defined above for each new batch.

		%% Cell type:code id: tags:

		``` python
		def iterbatches(batches):
		for i in range(batches):
		classes, points = generate_data(gan.batch_size)
		classes = dc.metrics.to_one_hot(classes, n_classes)
		yield {gan.data_inputs[0]: points, gan.conditional_inputs[0]: classes}

		gan.fit_gan(iterbatches(5000))
		```

		%% Output

		Ending global_step 999: generator average loss 0.87121, discriminator average loss 1.08472
		Ending global_step 1999: generator average loss 0.968357, discriminator average loss 1.17393
		Ending global_step 2999: generator average loss 0.710444, discriminator average loss 1.37858
		Ending global_step 3999: generator average loss 0.699195, discriminator average loss 1.38131
		Ending global_step 4999: generator average loss 0.694203, discriminator average loss 1.3871
		TIMING: model fitting took 31.352 s

		%% Cell type:markdown id: tags:

		Have the trained model generate some data, and see how well it matches the training distribution we plotted before.

		%% Cell type:code id: tags:

		``` python
		classes, points = generate_data(1000)
		one_hot_classes = dc.metrics.to_one_hot(classes, n_classes)
		gen_points = gan.predict_gan_generator(conditional_inputs=[one_hot_classes])
		plot.scatter(x=gen_points[:,0], y=gen_points[:,1], c=classes)
		```

		%% Output

		<matplotlib.collections.PathCollection at 0x160dedf50>



		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
		This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

examples/tutorials/17_Training_a_Generative_Adversarial_Network_on_MNIST.ipynb→examples/tutorials/15_Training_a_Generative_Adversarial_Network_on_MNIST.ipynb

+2 −2

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial Part 17: Training a Generative Adversarial Network on MNIST
		# Tutorial Part 15: Training a Generative Adversarial Network on MNIST


		In this tutorial, we will train a Generative Adversarial Network (GAN) on the MNIST dataset. This is a large collection of 28x28 pixel images of handwritten digits. We will try to train a network to produce new images of handwritten digits.


		## Colab

		This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/17_Training_a_Generative_Adversarial_Network_on_MNIST.ipynb)
		[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/15_Training_a_Generative_Adversarial_Network_on_MNIST.ipynb)

		## Setup

		To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

		%% Cell type:code id: tags:

		``` python
		!curl -Lo conda_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py
		import conda_installer
		conda_installer.install()
		!/root/miniconda/bin/conda info -e
		```

		%% Cell type:code id: tags:

		``` python
		!pip install --pre deepchem
		import deepchem
		deepchem.__version__
		```

		%% Cell type:markdown id: tags:

		To begin, let's import all the libraries we'll need and load the dataset (which comes bundled with Tensorflow).

		%% Cell type:code id: tags:

		``` python
		import deepchem as dc
		import tensorflow as tf
		from deepchem.models.optimizers import ExponentialDecay
		from tensorflow.keras.layers import Conv2D, Conv2DTranspose, Dense, Reshape
		import matplotlib.pyplot as plot
		import matplotlib.gridspec as gridspec
		%matplotlib inline

		mnist = tf.keras.datasets.mnist.load_data(path='mnist.npz')
		images = mnist[0][0].reshape((-1, 28, 28, 1))/255
		dataset = dc.data.NumpyDataset(images)
		```

		%% Cell type:markdown id: tags:

		Let's view some of the images to get an idea of what they look like.

		%% Cell type:code id: tags:

		``` python
		def plot_digits(im):
		plot.figure(figsize=(3, 3))
		grid = gridspec.GridSpec(4, 4, wspace=0.05, hspace=0.05)
		for i, g in enumerate(grid):
		ax = plot.subplot(g)
		ax.set_xticks([])
		ax.set_yticks([])
		ax.imshow(im[i,:,:,0], cmap='gray')

		plot_digits(images)
		```

		%% Output



		%% Cell type:markdown id: tags:

		Now we can create our GAN. Like in the last tutorial, it consists of two parts:

		1. The generator takes random noise as its input and produces output that will hopefully resemble the training data.
		2. The discriminator takes a set of samples as input (possibly training data, possibly created by the generator), and tries to determine which are which.

		This time we will use a different style of GAN called a Wasserstein GAN (or WGAN for short). In many cases, they are found to produce better results than conventional GANs. The main difference between the two is in the discriminator (often called a "critic" in this context). Instead of outputting the probability of a sample being real training data, it tries to learn how to measure the distance between the training distribution and generated distribution. That measure can then be directly used as a loss function for training the generator.

		We use a very simple model. The generator uses a dense layer to transform the input noise into a 7x7 image with eight channels. That is followed by two convolutional layers that upsample it first to 14x14, and finally to 28x28.

		The discriminator does roughly the same thing in reverse. Two convolutional layers downsample the image first to 14x14, then to 7x7. A final dense layer produces a single number as output. In the last tutorial we used a sigmoid activation to produce a number between 0 and 1 that could be interpreted as a probability. Since this is a WGAN, we instead use a softplus activation. It produces an unbounded positive number that can be interpreted as a distance.

		%% Cell type:code id: tags:

		``` python
		class DigitGAN(dc.models.WGAN):

		def get_noise_input_shape(self):
		return (10,)

		def get_data_input_shapes(self):
		return [(28, 28, 1)]

		def create_generator(self):
		return tf.keras.Sequential([
		Dense(778, activation=tf.nn.relu),
		Reshape((7, 7, 8)),
		Conv2DTranspose(filters=16, kernel_size=5, strides=2, activation=tf.nn.relu, padding='same'),
		Conv2DTranspose(filters=1, kernel_size=5, strides=2, activation=tf.sigmoid, padding='same')
		])

		def create_discriminator(self):
		return tf.keras.Sequential([
		Conv2D(filters=32, kernel_size=5, strides=2, activation=tf.nn.leaky_relu, padding='same'),
		Conv2D(filters=64, kernel_size=5, strides=2, activation=tf.nn.leaky_relu, padding='same'),
		Dense(1, activation=tf.math.softplus)
		])

		gan = DigitGAN(learning_rate=ExponentialDecay(0.001, 0.9, 5000))
		```

		%% Cell type:markdown id: tags:

		Now to train it. As in the last tutorial, we write a generator to produce data. This time the data is coming from a dataset, which we loop over 100 times.

		One other difference is worth noting. When training a conventional GAN, it is important to keep the generator and discriminator in balance thoughout training. If either one gets too far ahead, it becomes very difficult for the other one to learn.

		WGANs do not have this problem. In fact, the better the discriminator gets, the cleaner a signal it provides and the easier it becomes for the generator to learn. We therefore specify `generator_steps=0.2` so that it will only take one step of training the generator for every five steps of training the discriminator. This tends to produce faster training and better results.

		%% Cell type:code id: tags:

		``` python
		def iterbatches(epochs):
		for i in range(epochs):
		for batch in dataset.iterbatches(batch_size=gan.batch_size):
		yield {gan.data_inputs[0]: batch[0]}

		gan.fit_gan(iterbatches(100), generator_steps=0.2, checkpoint_interval=5000)
		```

		%% Output

		Ending global_step 4999: generator average loss 0.340072, discriminator average loss -0.0234236
		Ending global_step 9999: generator average loss 0.52308, discriminator average loss -0.00702729
		Ending global_step 14999: generator average loss 0.572661, discriminator average loss -0.00635684
		Ending global_step 19999: generator average loss 0.560454, discriminator average loss -0.00534357
		Ending global_step 24999: generator average loss 0.556055, discriminator average loss -0.00620613
		Ending global_step 29999: generator average loss 0.541958, discriminator average loss -0.00734233
		Ending global_step 34999: generator average loss 0.540904, discriminator average loss -0.00736641
		Ending global_step 39999: generator average loss 0.524298, discriminator average loss -0.00650514
		Ending global_step 44999: generator average loss 0.503931, discriminator average loss -0.00563732
		Ending global_step 49999: generator average loss 0.528964, discriminator average loss -0.00590612
		Ending global_step 54999: generator average loss 0.510892, discriminator average loss -0.00562366
		Ending global_step 59999: generator average loss 0.494756, discriminator average loss -0.00533636
		TIMING: model fitting took 4197.860 s

		%% Cell type:markdown id: tags:

		Let's generate some data and see how the results look.

		%% Cell type:code id: tags:

		``` python
		plot_digits(gan.predict_gan_generator(batch_size=16))
		```

		%% Output



		%% Cell type:markdown id: tags:

		Not too bad. Many of the generated images look plausibly like handwritten digits. A larger model trained for a longer time can do much better, of course.

		%% Cell type:markdown id: tags:

		# Congratulations! Time to join the Community!

		Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

		## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
		This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

		## Join the DeepChem Gitter
		The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!

examples/tutorials/16_Learning_Unsupervised_Embeddings_for_Molecules.ipynb

0 → 100644

+375 −0

File added.

Preview size limit exceeded, changes collapsed.

Admin message