Unverified Commit 95684e1b authored by Bharath Ramsundar's avatar Bharath Ramsundar Committed by GitHub
Browse files

Merge pull request #2338 from SanjeevaRDodlapati/patch-1

Update 14_Modeling_Protein_Ligand_Interactions_With_Atomic_Convolutio…
parents 50992082 bc1a3eec
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Tutorial Part 14: Modeling Protein-Ligand Interactions with Atomic Convolutions

This deepchem tutorial introduces Atomic Convolutional Model. We'll see the structure of the Atomic Conv Model and write a simple program to run Atomic Convolutions.

### Structure
ACNN’s directly exploit the local three-dimensional structure of molecules to hierarchically learn more complex chemical features by optimizing both the model and featurization simultaneously in an end-to-end fashion.

The atom type convolution makes use of a neighbor-listed distance matrix to extract features encoding local chemical environments from an input representation (Cartesian atomic coordinates) that does not necessarily contain spatial locality. Following are the methods use to build ACNN architecture:

- #### Distance Matrix
The distance matrix R is constructed from the Cartesian atomic coordinates X. It calculates distance from the distance tensor D. The distance matrix construction accepts as input a (N, 3) coordinate matrix C. This matrix is “neighbor listed” into a (N, M) matrix R.

```python
    R = tf.reduce_sum(tf.multiply(D, D), 3)     # D: Distance Tensor
    R = tf.sqrt(R)                              # R: Distance Matrix
    return R
```

- #### Atom type convolution
The output of the atom type convolution is constructed from the distance matrix R and atomic number matrix Z. The matrix R is fed into a (1x1) filter with stride 1 and depth of Na , where Na is the number of unique atomic numbers (atom types) present in the molecular system. The atom type convolution kernel is a step function that operates on neighbor distance matrix R.

- #### Radial Pooling layer
Radial Pooling is basically a dimensionality reduction process which down-samples the output of the atom type convolutions. The reduction process prevents overfitting by providing an abstracted form of representation through feature binning, as well as reducing the number of parameters learned.
Mathematically, radial pooling layers pool over tensor slices (receptive fields) of size (1xMx1) with stride 1 and a depth of Nr, where Nr is the number of desired radial filters.

- #### Atomistic fully connected network
Atomic Conolution layers are stacked by feeding the flattened(N, Na x Nr) output of radial pooling layer into the atom type convolution operation. Finally, we feed the tensor row-wise (per-atom) into a fully-connected network. The
same fully connected weights and biases are used for each atom in a given molecule.

Now that we have seen the structural overview of ACNNs, we'll try to get deeper into the model and see how we can train it and what do we expect as the output.

For the training purpose, we will use the publicly available PDBbind dataset. In this example, every row reflects a protein-ligand complex, and the following columns are present: a unique complex identifier; the SMILES string of the ligand; the binding affinity (Ki) of the ligand to the protein in the complex; a Python list of all lines in a PDB file for the protein alone; and a Python list of all lines in a ligand file for the ligand alone.

## Colab

This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/14_Modeling_Protein_Ligand_Interactions_With_Atomic_Convolutions.ipynb)

## Setup

To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.

%% Cell type:code id: tags:

``` python
!curl -Lo conda_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py
import conda_installer
conda_installer.install()
!/root/miniconda/bin/conda info -e
```

%% Output

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  3489  100  3489    0     0  27046      0 --:--:-- --:--:-- --:--:-- 27046

    add /root/miniconda/lib/python3.6/site-packages to PYTHONPATH
    python version: 3.6.9
    fetching installer from https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    done
    installing miniconda to /root/miniconda
    done
    installing rdkit, openmm, pdbfixer
    added omnia to channels
    added conda-forge to channels
    done
    conda packages installation finished!

    # conda environments:
    #
    base                  *  /root/miniconda
    

%% Cell type:code id: tags:

``` python
!pip install --pre deepchem
import deepchem
deepchem.__version__
```

%% Output

    Collecting deepchem
    [?25l  Downloading https://files.pythonhosted.org/packages/b5/d7/3ba15ec6f676ef4d93855d01e40cba75e231339e7d9ea403a2f53cabbab0/deepchem-2.4.0rc1.dev20200805054153.tar.gz (351kB)
    [K     |████████████████████████████████| 358kB 2.8MB/s
    [?25hRequirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from deepchem) (0.16.0)
    Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.18.5)
    Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.0.5)
    Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from deepchem) (0.22.2.post1)
    Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.4.1)
    Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas->deepchem) (2.8.1)
    Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->deepchem) (2018.9)
    Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.6.1->pandas->deepchem) (1.15.0)
    Building wheels for collected packages: deepchem
      Building wheel for deepchem (setup.py) ... [?25l[?25hdone
      Created wheel for deepchem: filename=deepchem-2.4.0rc1.dev20200805144642-cp36-none-any.whl size=438624 sha256=7e5b9b5d387726c10af3665c3fabc3cf8955c98122717ba2e3ccdb016174e99e
      Stored in directory: /root/.cache/pip/wheels/41/0f/fe/5f2659dc8e26624863654100f689d8f36cae7c872d2b310394
    Successfully built deepchem
    Installing collected packages: deepchem
    Successfully installed deepchem-2.4.0rc1.dev20200805144642

    '2.4.0-rc1.dev'

%% Cell type:code id: tags:

``` python
import deepchem as dc
import os
from deepchem.utils import download_url
```

%% Cell type:code id: tags:

``` python
download_url("https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/pdbbind_core_df.csv.gz")
data_dir = os.path.join(dc.utils.get_data_dir())
dataset_file= os.path.join(dc.utils.get_data_dir(), "pdbbind_core_df.csv.gz")
raw_dataset = dc.utils.save.load_from_disk(dataset_file)
raw_dataset = dc.utils.load_from_disk(dataset_file)
```

%% Cell type:code id: tags:

``` python
print("Type of dataset is: %s" % str(type(raw_dataset)))
print(raw_dataset[:5])
#print("Shape of dataset is: %s" % str(raw_dataset.shape))
```

%% Output

    Type of dataset is: <class 'pandas.core.frame.DataFrame'>
      pdb_id  ... label
    0   2d3u  ...  6.92
    1   3cyx  ...  8.00
    2   3uo4  ...  6.52
    3   1p1q  ...  4.89
    4   3ag9  ...  8.05
    
    [5 rows x 7 columns]

%% Cell type:markdown id: tags:

### Training the Model

%% Cell type:markdown id: tags:

Now that we've seen what our dataset looks like let's go ahead and do some python on this dataset.

%% Cell type:code id: tags:

``` python
import numpy as np
import tensorflow as tf
```

%% Cell type:markdown id: tags:

TODO(rbharath): This tutorial still needs to be fleshed out.

%% Cell type:markdown id: tags:

# Congratulations! Time to join the Community!

Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

## Join the DeepChem Gitter
The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!