Commit 81fc8266 authored by Bharath Ramsundar's avatar Bharath Ramsundar
Browse files

Tutorial start

parent 0058ab26
Loading
Loading
Loading
Loading
+31 −0
Original line number Diff line number Diff line
# DeepChem Step-by-Step Tutorial

In this tutorial series, you'll learn how to use DeepChem to solve interesting
and challenging problems in the life sciences. This tutorial series is
continually updated with new DeepChem features and models as implemented and is
designed to be accessible to beginners.

## Why do the DeepChem Tutorial?

**1) Career Advancement:** Applying AI in the life sciences is a booming
industry at present. There are a host of newly funded startups and initiatives
at large pharmaceutical and biotech companies centered around AI. Learning and
mastering DeepChem will bring you to the forefront of this field and will
prepare you to enter a career in this field.

**2) Humanitarian Considerations:** Disease is the oldest cause of human
suffering. From the dawn of human civilization, humans have suffered from pathogens,
cancers, and neurological conditions. One of the greatest achievements of
the last few centuries has been the development of effective treatments for
many diseases. By mastering the skills in this tutorial, you will be able to
stand on the shoulders of the giants of the past to help develop new
medicine.

**3) Lowering the Cost of Medicine:** The art of developing new medicine is
currently an elite skill that can only be practiced by a small core of expert
practitioners. By enabling the growth of open source tools for drug discovery,
you can help democratize these skills and open up drug discovery to more
competition. Increased competition can help drive down the cost of medicine.

## You Will Learn
* [Part 1: The Basic Tools of the Deep Life Sciences]()
+130 −0
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Tutorial: Deep Life Sciences
Welcome to DeepChem's introductory tutorial for the deep life sciences. This series of notebooks is step-by-step guide for you to get to know the new tools and techniques needed to do deep learning for the life sciences.

**Scope:** This tutorial will encompass both the machine learning and data handling needed to build systems for the deep life sciences.

## Outline
* Part 1: The Basic Tools of the Deep Life Sciences
* Part 2: Introduction to Molecular Data Handling
* Part 3: Molecular Machine Learning
* Part 4:

## Why do the DeepChem Tutorial?

**1) Career Advancement:** Applying AI in the life sciences is a booming
industry at present. There are a host of newly funded startups and initiatives
at large pharmaceutical and biotech companies centered around AI. Learning and
mastering DeepChem will bring you to the forefront of this field and will
prepare you to enter a career in this field.

**2) Humanitarian Considerations:** Disease is the oldest cause of human
suffering. From the dawn of human civilization, humans have suffered from pathogens,
cancers, and neurological conditions. One of the greatest achievements of
the last few centuries has been the development of effective treatments for
many diseases. By mastering the skills in this tutorial, you will be able to
stand on the shoulders of the giants of the past to help develop new
medicine.

**3) Lowering the Cost of Medicine:** The art of developing new medicine is
currently an elite skill that can only be practiced by a small core of expert
practitioners. By enabling the growth of open source tools for drug discovery,
you can help democratize these skills and open up drug discovery to more
competition. Increased competition can help drive down the cost of medicine.

## Getting Extra Credit
* Start DeepChem on GitHub! - https://github.com/deepchem/deepchem
* Make a YouTube video teaching the contents of this notebook.


## Part -1: Prerequisites

This tutorial will assume some basic familiarity with the Python data science ecosystem. We will assume that you have familiarity with libraries such as Numpy, Pandas, and TensorFlow.

## Part 0: Setup
The first step is to get DeepChem up and running. We recommend using conda for now to do this install.
```
conda install -c deepchem -c rdkit -c conda-forge -c omnia deepchem=2.1.0
```

%% Cell type:code id: tags:

``` python
# Run this cell to see if things work
import deepchem as dc
```

%% Output

    /home/bharath/anaconda3/envs/deepchem/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
      from numpy.core.umath_tests import inner1d

%% Cell type:markdown id: tags:

## The Basic Tools of the Deep Life Sciences
What does it take to do deep learning on the life sciences? Well, the first thing we'll need to do is actually handle some data. How can we start handling some basic data? For beginners, let's just take a look at some synthetic data.

To generate some basic synthetic data, we will use Numpy to create some basic arrays.

%% Cell type:code id: tags:

``` python
import numpy as np

data = np.random.random((4, 4))
labels = np.random.random((4,)) # labels of size 20x1
```

%% Cell type:markdown id: tags:

We've given these arrays some evocative names: "data" and "labels." For now, don't worry too much about the names, but just note that the arrays have different shapes. Let's take a quick look to get a feeling for these arrays

%% Cell type:code id: tags:

``` python
data, labels
```

%% Output

    (array([[0.17153735, 0.72653504, 0.75818459, 0.64997769],
            [0.64356789, 0.37895973, 0.46143683, 0.3251195 ],
            [0.51409105, 0.20522909, 0.29532684, 0.35239749],
            [0.49242761, 0.62127102, 0.77898693, 0.90960543]]),
     array([0.01939268, 0.43336842, 0.91222562, 0.23498551]))

%% Cell type:markdown id: tags:

In order to be able to work with this data in DeepChem, we need to wrap these arrays so DeepChem knows how to work with them. DeepChem has a `Dataset` API that it uses to facilitate its handling of datasets. For handling of Numpy datasets, we use DeepChem's `NumpyDataset` object.

%% Cell type:code id: tags:

``` python
from deepchem.data.datasets import NumpyDataset

dataset = NumpyDataset(data, labels)
```

%% Cell type:markdown id: tags:

Ok, now what? We have these arrays in a `NumpyDataset` object. What can we do with it? Let's try printing out the object.

%% Cell type:code id: tags:

``` python
dataset
```

%% Output

    <deepchem.data.datasets.NumpyDataset at 0x7ff02682c710>

%% Cell type:markdown id: tags:

Ok, that's not terribly informative. It's telling us that `dataset` is a Python object that lives somewhere in memory. Can we recover the two datasets that we used to construct this object? Luckily, this

%% Cell type:code id: tags:

``` python
```