Commit 1a87a4d1 authored by Haozhen Wu's avatar Haozhen Wu Committed by GitHub
Browse files

Merge branch 'master' into xgboostModel

parents 6e6f1e8e 9eef9e66
Loading
Loading
Loading
Loading

CONTRIBUTING.md

0 → 100644
+34 −0
Original line number Diff line number Diff line
# DeepChem

## Contributing to DeepChem

We actively encourage community contributions to DeepChem. The first place to start getting involved is by running our examples locally. Afterwards, we encourage contributors to give a shot to improving our documentation. While we take effort to provide good docs, there's plenty of room for improvement. All docs are hosted on Github, either in `README.md` file, or in the `docs/` directory.

Once you've got a sense of how the package works, we encourage the use of Github issues to discuss more complex changes,  raise requests for new features or propose changes to the global architecture of DeepChem. Once consensus is reached on the issue, please submit a PR with proposed modifications. All contributed code to DeepChem will be reviewed by a member of the DeepChem team, so please make sure your code style and documentation style match our guidelines!

## Contributor License Agreement
In order to get a pull request accepted you must fill out our [License Agreement](https://www.clahub.com/agreements/lilleswing/deepchem).  The purpose of this agreement is to enable DeepChem to distribute your code and its derivatives.

### The Agreement
Contributor offers to license certain software (a “Contribution” or multiple “Contributions”) to DeepChem, and DeepChem agrees to accept said Contributions, under the terms of the open source license [The MIT License](https://opensource.org/licenses/MIT)


The Contributor understands and agrees that DeepChem shall have the irrevocable and perpetual right to make and distribute copies of any Contribution, as well as to create and distribute collective works and derivative works of any Contribution, under [The MIT License](https://opensource.org/licenses/MIT).


DeepChem understands and agrees that Contributor retains copyright in its Contributions. Nothing in this Contributor Agreement shall be interpreted to prohibit Contributor from licensing its Contributions under different terms from the [The MIT License](https://opensource.org/licenses/MIT) or this Contributor Agreement.

### Code Style Guidelines
DeepChem uses [yapf](https://github.com/google/yapf) to autoformat code.

``` bash
pip install yapf==0.16.0
cd <git_root>
yapf -i <python_files changed>
```

Our integration tests will fail if code is not formatted correctly

### Documentation Style Guidelines
DeepChem uses [NumPy style documentation](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt). Please follow these conventions when documenting code, since we use [Sphinx+Napoleon](http://www.sphinx-doc.org/en/stable/ext/napoleon.html) to automatically generate docs on [deepchem.io](deepchem.io).

LICENSE

100644 → 100755
+4 −524

File changed.File mode changed from 100644 to 100755.

Preview size limit exceeded, changes collapsed.

+6 −18
Original line number Diff line number Diff line
@@ -44,7 +44,7 @@ You can install deepchem in a new conda environment using the conda commands in

```bash
bash scripts/install_deepchem_conda.sh deepchem
pip install tensorflow-gpu==0.12.1                      # If you want GPU support
pip install tensorflow-gpu==1.0.1                      # If you want GPU support
git clone https://github.com/deepchem/deepchem.git      # Clone deepchem source code from GitHub
cd deepchem
python setup.py install                                 # Manual install
@@ -95,7 +95,7 @@ via this installation procedure.
    contact your local sysadmin to work out a custom installation. If your
    version of Linux is recent, then the following command will work:
    ```
    pip install tensorflow-gpu==0.12.1
    pip install tensorflow-gpu==1.0.1
    ```

9. `deepchem`: Clone the `deepchem` github repo:
@@ -509,22 +509,6 @@ Time needed for benchmark test(~20h in total)
|kaggle          |MT-NN regression    |2200            |3200           |


## Contributing to DeepChem

We actively encourage community contributions to DeepChem. The first place to start getting involved is by running our examples locally. Afterwards, we encourage contributors to give a shot to improving our documentation. While we take effort to provide good docs, there's plenty of room for improvement. All docs are hosted on Github, either in this `README.md` file, or in the `docs/` directory.

Once you've got a sense of how the package works, we encourage the use of Github issues to discuss more complex changes,  raise requests for new features or propose changes to the global architecture of DeepChem. Once consensus is reached on the issue, please submit a PR with proposed modifications. All contributed code to DeepChem will be reviewed by a member of the DeepChem team, so please make sure your code style and documentation style match our guidelines!

### Code Style Guidelines
DeepChem uses [yapf](https://github.com/google/yapf) to autoformat code.  We created a git pre-commit hook to make this process easier.

``` bash
cp devtools/travis-ci/pre-commit .git/hooks
pip install yapf==0.16.0
```

### Documentation Style Guidelines
DeepChem uses [NumPy style documentation](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt). Please follow these conventions when documenting code, since we use [Sphinx+Napoleon](http://www.sphinx-doc.org/en/stable/ext/napoleon.html) to automatically generate docs on [deepchem.io](deepchem.io).

### Gitter
Join us on gitter at [https://gitter.im/deepchem/Lobby](https://gitter.im/deepchem/Lobby). Probably the easiest place to ask simple questions or float requests for new features.
@@ -538,3 +522,7 @@ Approaches](http://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00290)

## About Us
DeepChem is a package by the [Pande group](https://pande.stanford.edu/) at Stanford. DeepChem was originally created by [Bharath Ramsundar](http://rbharath.github.io/), and has grown through the contributions of a number of undergraduate, graduate, and postdoctoral researchers working with the Pande lab.


## Version
1.0.1
+1 −1
Original line number Diff line number Diff line
@@ -18,7 +18,7 @@ import shutil

__author__ = "Bharath Ramsundar"
__copyright__ = "Copyright 2016, Stanford University"
__license__ = "GPL"
__license__ = "MIT"


def sparsify_features(X):
+35 −23
Original line number Diff line number Diff line
@@ -7,7 +7,7 @@ from __future__ import unicode_literals

__author__ = "Bharath Ramsundar"
__copyright__ = "Copyright 2016, Stanford University"
__license__ = "GPL"
__license__ = "MIT"

import unittest
import tempfile
@@ -16,6 +16,7 @@ import shutil
import numpy as np
import deepchem as dc


def load_solubility_data():
  """Loads solubility dataset"""
  current_dir = os.path.dirname(os.path.abspath(__file__))
@@ -28,39 +29,45 @@ def load_solubility_data():

  return loader.featurize(input_file)


def load_butina_data():
  """Loads solubility dataset"""
  current_dir = os.path.dirname(os.path.abspath(__file__))
  featurizer = dc.feat.CircularFingerprint(size=1024)
  tasks = ["task"]
  # task_type = "regression"
  input_file = os.path.join(current_dir, "../../models/tests/butina_example.csv")
  input_file = os.path.join(current_dir,
                            "../../models/tests/butina_example.csv")
  loader = dc.data.CSVLoader(
      tasks=tasks, smiles_field="smiles", featurizer=featurizer)

  return loader.featurize(input_file)


def load_multitask_data():
  """Load example multitask data."""
  current_dir = os.path.dirname(os.path.abspath(__file__))
  featurizer = dc.feat.CircularFingerprint(size=1024)
  tasks = ["task0", "task1", "task2", "task3", "task4", "task5", "task6",
           "task7", "task8", "task9", "task10", "task11", "task12",
           "task13", "task14", "task15", "task16"]
  input_file = os.path.join(
      current_dir, "../../models/tests/multitask_example.csv")
  tasks = [
      "task0", "task1", "task2", "task3", "task4", "task5", "task6", "task7",
      "task8", "task9", "task10", "task11", "task12", "task13", "task14",
      "task15", "task16"
  ]
  input_file = os.path.join(current_dir,
                            "../../models/tests/multitask_example.csv")
  loader = dc.data.CSVLoader(
      tasks=tasks, smiles_field="smiles", featurizer=featurizer)
  return loader.featurize(input_file)


def load_classification_data():
  """Loads classification data from example.csv"""
  current_dir = os.path.dirname(os.path.abspath(__file__))
  featurizer = dc.feat.CircularFingerprint(size=1024)
  tasks = ["outcome"]
  task_type = "classification"
  input_file = os.path.join(
      current_dir, "../../models/tests/example_classification.csv")
  input_file = os.path.join(current_dir,
                            "../../models/tests/example_classification.csv")
  loader = dc.data.CSVLoader(
      tasks=tasks, smiles_field="smiles", featurizer=featurizer)
  return loader.featurize(input_file)
@@ -70,26 +77,30 @@ def load_sparse_multitask_dataset():
  """Load sparse tox multitask data, sample dataset."""
  current_dir = os.path.dirname(os.path.abspath(__file__))
  featurizer = dc.feat.CircularFingerprint(size=1024)
  tasks = ["task1", "task2", "task3", "task4", "task5", "task6",
           "task7", "task8", "task9"]
  input_file = os.path.join(
      current_dir, "../../models/tests/sparse_multitask_example.csv")
  tasks = [
      "task1", "task2", "task3", "task4", "task5", "task6", "task7", "task8",
      "task9"
  ]
  input_file = os.path.join(current_dir,
                            "../../models/tests/sparse_multitask_example.csv")
  loader = dc.data.CSVLoader(
      tasks=tasks, smiles_field="smiles", featurizer=featurizer)
  return loader.featurize(input_file)


def load_feat_multitask_data():
  """Load example with numerical features, tasks."""
  current_dir = os.path.dirname(os.path.abspath(__file__))
  features = ["feat0", "feat1", "feat2", "feat3", "feat4", "feat5"]
  featurizer = dc.feat.UserDefinedFeaturizer(features)
  tasks = ["task0", "task1", "task2", "task3", "task4", "task5"]
  input_file = os.path.join(
      current_dir, "../../models/tests/feat_multitask_example.csv")
  input_file = os.path.join(current_dir,
                            "../../models/tests/feat_multitask_example.csv")
  loader = dc.data.UserCSVLoader(
      tasks=tasks, featurizer=featurizer, id_field="id")
  return loader.featurize(input_file)


def load_gaussian_cdf_data():
  """Load example with numbers sampled from Gaussian normal distribution.
     Each feature and task is a column of values that is sampled
@@ -98,12 +109,13 @@ def load_gaussian_cdf_data():
  features = ["feat0", "feat1"]
  featurizer = dc.feat.UserDefinedFeaturizer(features)
  tasks = ["task0", "task1"]
  input_file = os.path.join(
      current_dir, "../../models/tests/gaussian_cdf_example.csv")
  input_file = os.path.join(current_dir,
                            "../../models/tests/gaussian_cdf_example.csv")
  loader = dc.data.UserCSVLoader(
      tasks=tasks, featurizer=featurizer, id_field="id")
  return loader.featurize(input_file)


def load_unlabelled_data():
  current_dir = os.path.dirname(os.path.abspath(__file__))
  featurizer = dc.feat.CircularFingerprint(size=1024)
Loading