Merge pull request #1180 from nitinprakash96/acnn (92dcaf92) · Commits · 钟慕尧 / deepchem

examples/notebooks/atomic_convolutions.ipynb

0 → 100644

+144 −0

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		## Atomic Convolutional Model

		%% Cell type:markdown id: tags:

		This deepchem tutorial introduces Atomic Convolutional Model. We'll see the structure of the Atomic Conv Model and write a simple program to run Atomic Convolutions.

		%% Cell type:markdown id: tags:

		### Structure

		%% Cell type:markdown id: tags:

		ACNN’s directly exploit the local three-dimensional structure of molecules to hierarchically learn more complex chemical features by optimizing both the model and featurization simultaneously in an end-to-end fashion.

		The atom type convolution makes use of a neighbor-listed distance matrix to extract features encoding local chemical environments from an input representation (Cartesian atomic coordinates) that does not necessarily contain spatial locality. Following are the methods use to build ACNN architecture:

		- #### Distance Matrix
		The distance matrix R is constructed from the Cartesian atomic coordinates X. It calculates distance from the distance tensor D. The distance matrix construction accepts as input a (N, 3) coordinate matrix C. This matrix is “neighbor listed” into a (N, M) matrix R.

		```python
		R = tf.reduce_sum(tf.multiply(D, D), 3) # D: Distance Tensor
		R = tf.sqrt(R) # R: Distance Matrix
		return R
		```

		- #### Atom type convolution
		The output of the atom type convolution is constructed from the distance matrix R and atomic number matrix Z. The matrix R is fed into a (1x1) filter with stride 1 and depth of Na , where Na is the number of unique atomic numbers (atom types) present in the molecular system. The atom type convolution kernel is a step function that operates on neighbor distance matrix R.

		- #### Radial Pooling layer
		Radial Pooling is basically a dimensionality reduction process which down-samples the output of the atom type convolutions. The reduction process prevents overfitting by providing an abstracted form of representation through feature binning, as well as reducing the number of parameters learned.
		Mathematically, radial pooling layers pool over tensor slices (receptive fields) of size (1xMx1) with stride 1 and a depth of Nr, where Nr is the number of desired radial filters.

		- #### Atomistic fully connected network
		Atomic Conolution layers are stacked by feeding the flattened(N, Na x Nr) output of radial pooling layer into the atom type convolution operation. Finally, we feed the tensor row-wise (per-atom) into a fully-connected network. The
		same fully connected weights and biases are used for each atom in a given molecule.

		%% Cell type:markdown id: tags:

		Now that we have seen the structural overview of ACNNs, we'll try to get deeper into the model and see how we can train it and what do we expect as the output.

		For the training purpose, we will use the publicly available PDBbind dataset. In this example, every row reflects a protein-ligand complex, and the following columns are present: a unique complex identifier; the SMILES string of the ligand; the binding affinity (Ki) of the ligand to the protein in the complex; a Python list of all lines in a PDB file for the protein alone; and a Python list of all lines in a ligand file for the ligand alone.

		%% Cell type:code id: tags:

		``` python
		%load_ext autoreload
		%autoreload 2
		%pdb off
		# set DISPLAY = True when running tutorial
		DISPLAY = True
		# set PARALLELIZE to true if you want to use ipyparallel
		PARALLELIZE = False
		import warnings
		warnings.filterwarnings('ignore')
		```

		%% Output

		Automatic pdb calling has been turned OFF

		%% Cell type:code id: tags:

		``` python
		import deepchem as dc
		import os
		from deepchem.utils import download_url
		```

		%% Cell type:code id: tags:

		``` python
		download_url("https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/pdbbind_core_df.csv.gz")
		data_dir = os.path.join(dc.utils.get_data_dir())
		dataset_file= os.path.join(dc.utils.get_data_dir(), "pdbbind_core_df.csv.gz")
		raw_dataset = dc.utils.save.load_from_disk(dataset_file)
		```

		%% Cell type:code id: tags:

		``` python
		print("Type of dataset is: %s" % str(type(raw_dataset)))
		print(raw_dataset[:5])
		#print("Shape of dataset is: %s" % str(raw_dataset.shape))
		```

		%% Output

		Type of dataset is: <class 'pandas.core.frame.DataFrame'>
		pdb_id smiles \
		0 2d3u CC1CCCCC1S(O)(O)NC1CC(C2CCC(CN)CC2)SC1C(O)O
		1 3cyx CC(C)(C)NC(O)C1CC2CCCCC2C[NH+]1CC(O)C(CC1CCCCC...
		2 3uo4 OC(O)C1CCC(NC2NCCC(NC3CCCCC3C3CCCCC3)N2)CC1
		3 1p1q CC1ONC(O)C1CC([NH3+])C(O)O
		4 3ag9 NC(O)C(CCC[NH2+]C([NH3+])[NH3+])NC(O)C(CCC[NH2...

		complex_id \
		0 2d3uCC1CCCCC1S(O)(O)NC1CC(C2CCC(CN)CC2)SC1C(O)O
		1 3cyxCC(C)(C)NC(O)C1CC2CCCCC2C[NH+]1CC(O)C(CC1C...
		2 3uo4OC(O)C1CCC(NC2NCCC(NC3CCCCC3C3CCCCC3)N2)CC1
		3 1p1qCC1ONC(O)C1CC([NH3+])C(O)O
		4 3ag9NC(O)C(CCC[NH2+]C([NH3+])[NH3+])NC(O)C(CCC...

		protein_pdb \
		0 ['HEADER 2D3U PROTEIN\n', 'COMPND 2D3U P...
		1 ['HEADER 3CYX PROTEIN\n', 'COMPND 3CYX P...
		2 ['HEADER 3UO4 PROTEIN\n', 'COMPND 3UO4 P...
		3 ['HEADER 1P1Q PROTEIN\n', 'COMPND 1P1Q P...
		4 ['HEADER 3AG9 PROTEIN\n', 'COMPND 3AG9 P...

		ligand_pdb \
		0 ['COMPND 2d3u ligand \n', 'AUTHOR GENERA...
		1 ['COMPND 3cyx ligand \n', 'AUTHOR GENERA...
		2 ['COMPND 3uo4 ligand \n', 'AUTHOR GENERA...
		3 ['COMPND 1p1q ligand \n', 'AUTHOR GENERA...
		4 ['COMPND 3ag9 ligand \n', 'AUTHOR GENERA...

		ligand_mol2 label
		0 ['### \n', '### Created by X-TOOL on Thu Aug 2... 6.92
		1 ['### \n', '### Created by X-TOOL on Thu Aug 2... 8.00
		2 ['### \n', '### Created by X-TOOL on Fri Aug 2... 6.52
		3 ['### \n', '### Created by X-TOOL on Thu Aug 2... 4.89
		4 ['### \n', '### Created by X-TOOL on Thu Aug 2... 8.05

		%% Cell type:markdown id: tags:

		### Training the Model

		%% Cell type:markdown id: tags:

		Now that we've seen what our dataset looks like let's go ahead and do some python on this dataset.

		%% Cell type:code id: tags:

		``` python
		import numpy as np
		import tensorflow as tf
		```

		%% Cell type:code id: tags:

		``` python
		```

Admin message