Unverified Commit 8b870301 authored by Vignesh Ram Somnath's avatar Vignesh Ram Somnath Committed by GitHub
Browse files

Merge pull request #7 from deepchem/master

Changes since 25.10.2018
parents db47d907 8b630252
Loading
Loading
Loading
Loading

PUBLICATIONS.md

0 → 100644
+8 −0
Original line number Diff line number Diff line
## DeepChem Publications

1. [Computational Modeling of β-secretase 1 (BACE-1) Inhibitors using
Ligand Based
Approaches](http://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00290)
2. [Low Data Drug Discovery with One-Shot Learning](http://pubs.acs.org/doi/abs/10.1021/acscentsci.6b00367)
3. [MoleculeNet: A Benchmark for Molecular Machine Learning](https://arxiv.org/abs/1703.00564)
4. [Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity](https://arxiv.org/abs/1703.10603)
+55 −100
Original line number Diff line number Diff line
@@ -12,45 +12,63 @@ democratizes the use of deep-learning in drug discovery, materials science, quan

* [Requirements](#requirements)
* [Installation](#installation)
    * [Easy Install with Conda](#easy-install-with-conda)
    * [Conda Environment](#using-a-conda-environment)
    * [Docker](#using-a-docker-image)
* [FAQ](#faq)
* [FAQ and Troubleshooting](#faq-and-troubleshooting)
* [Getting Started](#getting-started)
    * [Input Formats](#input-formats)
    * [Data Featurization](#data-featurization)
    * [Performances](#performances)
* [Contributing to DeepChem](/CONTRIBUTING.md)
    * [Code Style Guidelines](/CONTRIBUTING.md#code-style-guidelines)
    * [Documentation Style Guidelines](/CONTRIBUTING.md#documentation-style-guidelines)
    * [Gitter](#gitter)
* [DeepChem Publications](#deepchem-publications)
* [Corporate Supporters](#corporate-supporters)
    * [Schrödinger](#schrödinger)
    * [DeepCrystal](#deep-crystal)
* [Examples](/examples)
* [About Us](#about-us)
* [Citing DeepChem](#citing-deepchem)

## Requirements
* [pandas](http://pandas.pydata.org/)
* [rdkit](http://www.rdkit.org/docs/Install.html)
* [boost](http://www.boost.org/)
* [joblib](https://pypi.python.org/pypi/joblib)
* [sklearn](https://github.com/scikit-learn/scikit-learn.git)
* [numpy](https://store.continuum.io/cshop/anaconda/)
* [tensorflow](https://www.tensorflow.org/)

### Soft Requirements
DeepChem has a number of "soft" requirements. These are packages which are needed for various submodules of DeepChem but not for the package as a whole.

* [rdkit](http://www.rdkit.org/docs/Install.html)
* [six](https://pypi.python.org/pypi/six)
* [mdtraj](http://mdtraj.org/)
* [tensorflow](https://www.tensorflow.org/)

## Installation
### Easy Install via Conda

```deepchem``` currently supports both Python 2.7 and Python 3.5, and is supported on 64 bit Linux and Mac OSX. Please make sure you follow the directions below precisely. While you may already have system versions of some of our dependencies, there is no guarantee that `deepchem` will work with alternate versions than those specified below.
```bash
conda install -c deepchem -c rdkit -c conda-forge -c omnia deepchem=2.1.0
```
**Note:** `Easy Install` installs the latest stable version of `deepchem` and _does not install from source_. If you need to install from source make sure you follow the steps [here](#using-a-conda-environment).

Note that when using Ubuntu 16.04 server or similar environments, you may need to ensure libxrender is provided via e.g.:
### Using a Docker Image
Using a docker image requires an NVIDIA GPU.  If you do not have a GPU please follow the directions for [using a conda environment](#using-a-conda-environment)
In order to get GPU support you will have to use the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) plugin.
``` bash
sudo apt-get install -y libxrender-dev
# This will the download the latest stable deepchem docker image into your images
docker pull deepchemio/deepchem

# This will create a container out of our latest image with GPU support
nvidia-docker run -i -t deepchemio/deepchem

# You are now in a docker container whose python has deepchem installed
# For example you can run our tox21 benchmark
cd deepchem/examples
python benchmark.py -d tox21

# Or you can start playing with it in the command line
pip install jupyter
ipython
import deepchem as dc
```

### Using a conda environment
### Installing from source in a conda environment
You can install deepchem in a new conda environment using the conda commands in scripts/install_deepchem_conda.sh
Installing via this script will ensure that you are **installing from the source**.

@@ -79,36 +97,11 @@ Check [this link](https://conda.io/docs/using/envs.html) for more information ab
the benefits and usage of conda environments. **Warning**: Segmentation faults can [still happen](https://github.com/deepchem/deepchem/pull/379#issuecomment-277013514)
via this installation procedure.

### Easy Install via Conda

```bash
conda install -c deepchem -c rdkit -c conda-forge -c omnia deepchem=2.1.0
```
**Note:** `Easy Install` installs the latest stable version of `deepchem` and _does not install from source_. If you need to install from source make sure you follow the steps [here](#using-a-conda-environment).
## FAQ and Troubleshooting

### Using a Docker Image
Using a docker image requires an NVIDIA GPU.  If you do not have a GPU please follow the directions for [using a conda environment](#using-a-conda-environment)
In order to get GPU support you will have to use the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) plugin.
``` bash
# This will the download the latest stable deepchem docker image into your images
docker pull deepchemio/deepchem

# This will create a container out of our latest image with GPU support
nvidia-docker run -i -t deepchemio/deepchem

# You are now in a docker container whose python has deepchem installed
# For example you can run our tox21 benchmark
cd deepchem/examples
python benchmark.py -d tox21

# Or you can start playing with it in the command line
pip install jupyter
ipython
import deepchem as dc
```

## FAQ
1. Question: I'm seeing some failures in my test suite having to do with MKL
1. ```deepchem``` currently supports both Python 2.7 and Python 3.5, and is supported on 64 bit Linux and Mac OSX. Note that DeepChem is not currently maintained for Python 3.6 or with other operating systems. 
2. Question: I'm seeing some failures in my test suite having to do with MKL
   ```Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.```

   Answer: This is a general issue with the newest version of `scikit-learn` enabling MKL by default. This doesn't play well with many linux systems. See [BVLC/caffe#3884](https://github.com/BVLC/caffe/issues/3884) for discussions. The following seems to fix the issue
@@ -116,80 +109,42 @@ import deepchem as dc
   conda install nomkl numpy scipy scikit-learn numexpr
   conda remove mkl mkl-service
   ```
3.  Note that when using Ubuntu 16.04 server or similar environments, you may need to ensure libxrender is provided via e.g.:
   ```bash
   sudo apt-get install -y libxrender-dev
   ```

## Getting Started
Two good tutorials to get started are [Graph Convolutional Networks](https://deepchem.io/docs/notebooks/graph_convolutional_networks_for_tox21.html) and [Multitask_Networks_on_MUV](https://deepchem.io/docs/notebooks/Multitask_Networks_on_MUV.html). Follow along with the tutorials to see how to predict properties on molecules using neural networks.

Afterwards you can go through other [tutorials](https://deepchem.io/docs/notebooks/index.html), and look through our examples in the `examples` directory. To apply `deepchem` to a new problem, try starting from one of the existing examples or tutorials and modifying it step by step to work with your new use-case. If you have questions or comments you can raise them on our [gitter](https://gitter.im/deepchem/Lobby).

### Input Formats
Accepted input formats for deepchem include csv, pkl.gz, and sdf files. For
example, with a csv input, in order to build models, we expect the
following columns to have entries for each row in the csv file.

1. A column containing SMILES strings [1].
2. A column containing an experimental measurement.
3. (Optional) A column containing a unique compound identifier.

Here's an example of a potential input file.

|Compound ID    | measured log solubility in mols per litre | smiles         |
|---------------|-------------------------------------------|----------------|
| benzothiazole | -1.5                                      | c2ccc1scnc1c2  |


Here the "smiles" column contains the SMILES string, the "measured log
solubility in mols per litre" contains the experimental measurement and
"Compound ID" contains the unique compound identifier.

[2] Anderson, Eric, Gilman D. Veith, and David Weininger. "SMILES, a line
notation and computerized interpreter for chemical structures." US
Environmental Protection Agency, Environmental Research Laboratory, 1987.

### Data Featurization

Most machine learning algorithms require that input data form vectors.
However, input data for drug-discovery datasets routinely come in the
format of lists of molecules and associated experimental readouts. To
transform lists of molecules into vectors, we need to subclasses of DeepChem
loader class ```dc.data.DataLoader``` such as ```dc.data.CSVLoader``` or
```dc.data.SDFLoader```. Users can subclass ```dc.data.DataLoader``` to
load arbitrary file formats. All loaders must be
passed a ```dc.feat.Featurizer``` object. DeepChem provides a number of
different subclasses of ```dc.feat.Featurizer``` for convenience.

### Performances
In depth performance tables for DeepChem models are available on [MoleculeNet.ai](https://moleculenet.ai)
### Benchmarks
In depth benchrmarking tables for DeepChem models are available on [MoleculeNet.ai](https://moleculenet.ai)

### Gitter
Join us on gitter at [https://gitter.im/deepchem/Lobby](https://gitter.im/deepchem/Lobby). Probably the easiest place to ask simple questions or float requests for new features.

## DeepChem Publications
1. [Computational Modeling of β-secretase 1 (BACE-1) Inhibitors using
Ligand Based
Approaches](http://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00290)
2. [Low Data Drug Discovery with One-Shot Learning](http://pubs.acs.org/doi/abs/10.1021/acscentsci.6b00367)
3. [MoleculeNet: A Benchmark for Molecular Machine Learning](https://arxiv.org/abs/1703.00564)
4. [Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity](https://arxiv.org/abs/1703.10603)

## About Us
DeepChem is possible due to notable contributions from many people including Peter Eastman, Evan Feinberg, Joe Gomes, Karl Leswing, Vijay Pande, Aneesh Pappu, Bharath Ramsundar and Michael Wu (alphabetical ordering).  DeepChem was originally created by [Bharath Ramsundar](http://rbharath.github.io/) with encouragement and guidance from [Vijay Pande](https://pande.stanford.edu/).

DeepChem started as a [Pande group](https://pande.stanford.edu/) project at Stanford, and is now developed by many academic and industrial collaborators. DeepChem actively encourages new academic and industrial groups to contribute!

## Corporate Supporters
DeepChem is supported by a number of corporate partners who use DeepChem to solve interesting problems.
## Citing DeepChem

### Schrödinger
[![Schödinger](https://github.com/deepchem/deepchem/blob/master/docs/source/_static/schrodinger_logo.png)](https://www.schrodinger.com/)
If you have used DeepChem in the course of your research, we ask that you cite the "Deep Learning for the Life Sciences" book by the DeepChem core team.

> DeepChem has transformed how we think about building QSAR and QSPR models when very large data sets are available; and we are actively using DeepChem to investigate how to best combine the power of deep learning with next generation physics-based scoring methods.

### DeepCrystal
<img src="https://github.com/deepchem/deepchem/blob/master/docs/source/_static/deep_crystal_logo.png" alt="DeepCrystal Logo" height=150px/>

> DeepCrystal was an early adopter of DeepChem, which we now rely on to abstract away some of the hardest pieces of deep learning in drug discovery. By open sourcing these efficient implementations of chemically / biologically aware deep-learning systems, DeepChem puts the latest research into the hands of the scientists that need it, materially pushing forward the field of in-silico drug discovery in the process.
To cite this book, please use this bibtex entry:

```
@book{Ramsundar-et-al-2019,
    title={Deep Learning for the Life Sciences},
    author={Bharath Ramsundar and Peter Eastman and Karl Leswing and Patrick Walters and Vijay Pande},
    publisher={O'Reilly Media},
    note={\url{https://www.amazon.com/Deep-Learning-Life-Sciences-Microscopy/dp/1492039837}},
    year={2019}
}
```

## Version
2.1.0

SUPPORTERS.md

0 → 100644
+14 −0
Original line number Diff line number Diff line
## Corporate Supporters
DeepChem is supported by a number of corporate partners who use DeepChem to solve interesting problems.

### Schrödinger
[![Schödinger](https://github.com/deepchem/deepchem/blob/master/docs/source/_static/schrodinger_logo.png)](https://www.schrodinger.com/)

> DeepChem has transformed how we think about building QSAR and QSPR models when very large data sets are available; and we are actively using DeepChem to investigate how to best combine the power of deep learning with next generation physics-based scoring methods.

### DeepCrystal
<img src="https://github.com/deepchem/deepchem/blob/master/docs/source/_static/deep_crystal_logo.png" alt="DeepCrystal Logo" height=150px/>

> DeepCrystal was an early adopter of DeepChem, which we now rely on to abstract away some of the hardest pieces of deep learning in drug discovery. By open sourcing these efficient implementations of chemically / biologically aware deep-learning systems, DeepChem puts the latest research into the hands of the scientists that need it, materially pushing forward the field of in-silico drug discovery in the process.

+1 −1
Original line number Diff line number Diff line
@@ -52,7 +52,7 @@ def load_images_DR(split='random', seed=None):

  loader = deepchem.data.ImageLoader()
  dat = loader.featurize(
      image_full_paths, labels=labels, weights=weights, read_img=False)
      image_full_paths, labels=labels, weights=weights)
  if split == None:
    return dat

+4 −30
Original line number Diff line number Diff line
@@ -63,7 +63,7 @@ class DRModel(TensorGraph):
  def build_graph(self):
    # inputs placeholder
    self.inputs = Feature(
        shape=(None, self.image_size, self.image_size, 3), dtype=tf.uint8)
        shape=(None, self.image_size, self.image_size, 3), dtype=tf.float32)
    # data preprocessing and augmentation
    in_layer = DRAugment(
        self.augment,
@@ -142,32 +142,6 @@ class DRModel(TensorGraph):
    # weighted_loss = WeightDecay(0.1, 'l2', in_layers=[weighted_loss])
    self.set_loss(weighted_loss)

  def default_generator(self,
                        dataset,
                        epochs=1,
                        predict=False,
                        deterministic=True,
                        pad_batches=True):
    for epoch in range(epochs):
      for (X_b, y_b, w_b, ids_b) in dataset.iterbatches(
          batch_size=self.batch_size,
          deterministic=deterministic,
          pad_batches=pad_batches):
        feed_dict = dict()

        if None in X_b:
          # load images on the fly
          feed_dict[self.features[0]] = ImageLoader.load_img(ids_b)
        else:
          feed_dict[self.features[0]] = X_b

        if y_b is not None and not predict:
          feed_dict[self.labels[0]] = y_b
        if w_b is not None and not predict:
          feed_dict[self.task_weights[0]] = w_b

        yield feed_dict


def DRAccuracy(y, y_pred):
  y_pred = np.argmax(y_pred, 1)
@@ -248,7 +222,7 @@ class DRAugment(Layer):
    parent_tensor = inputs[0]
    training = kwargs['training'] if 'training' in kwargs else 1.0

    parent_tensor = tf.image.convert_image_dtype(parent_tensor, tf.float32)
    parent_tensor = parent_tensor / 255.0
    if not self.augment:
      out_tensor = parent_tensor
    else:
Loading