Merge branch 'master' into xgboostModel (1a87a4d1) · Commits · 钟慕尧 / deepchem

CONTRIBUTING.md

0 → 100644

+34 −0

Original line number	Diff line number	Diff line
		# DeepChem

		## Contributing to DeepChem

		We actively encourage community contributions to DeepChem. The first place to start getting involved is by running our examples locally. Afterwards, we encourage contributors to give a shot to improving our documentation. While we take effort to provide good docs, there's plenty of room for improvement. All docs are hosted on Github, either in `README.md` file, or in the `docs/` directory.

		Once you've got a sense of how the package works, we encourage the use of Github issues to discuss more complex changes, raise requests for new features or propose changes to the global architecture of DeepChem. Once consensus is reached on the issue, please submit a PR with proposed modifications. All contributed code to DeepChem will be reviewed by a member of the DeepChem team, so please make sure your code style and documentation style match our guidelines!

		## Contributor License Agreement
		In order to get a pull request accepted you must fill out our [License Agreement](https://www.clahub.com/agreements/lilleswing/deepchem). The purpose of this agreement is to enable DeepChem to distribute your code and its derivatives.

		### The Agreement
		Contributor offers to license certain software (a “Contribution” or multiple “Contributions”) to DeepChem, and DeepChem agrees to accept said Contributions, under the terms of the open source license [The MIT License](https://opensource.org/licenses/MIT)


		The Contributor understands and agrees that DeepChem shall have the irrevocable and perpetual right to make and distribute copies of any Contribution, as well as to create and distribute collective works and derivative works of any Contribution, under [The MIT License](https://opensource.org/licenses/MIT).


		DeepChem understands and agrees that Contributor retains copyright in its Contributions. Nothing in this Contributor Agreement shall be interpreted to prohibit Contributor from licensing its Contributions under different terms from the [The MIT License](https://opensource.org/licenses/MIT) or this Contributor Agreement.

		### Code Style Guidelines
		DeepChem uses [yapf](https://github.com/google/yapf) to autoformat code.

		``` bash
		pip install yapf==0.16.0
		cd <git_root>
		yapf -i <python_files changed>
		```

		Our integration tests will fail if code is not formatted correctly

		### Documentation Style Guidelines
		DeepChem uses [NumPy style documentation](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt). Please follow these conventions when documenting code, since we use [Sphinx+Napoleon](http://www.sphinx-doc.org/en/stable/ext/napoleon.html) to automatically generate docs on [deepchem.io](deepchem.io).

LICENSE

100644 → 100755

+4 −524

File changed.File mode changed from 100644 to 100755.

Preview size limit exceeded, changes collapsed.

README.md

+6 −18

Original line number	Diff line number	Diff line
		@@ -44,7 +44,7 @@ You can install deepchem in a new conda environment using the conda commands in

		```bash
		bash scripts/install_deepchem_conda.sh deepchem
		pip install tensorflow-gpu==0.12.1 # If you want GPU support
		pip install tensorflow-gpu==1.0.1 # If you want GPU support
		git clone https://github.com/deepchem/deepchem.git # Clone deepchem source code from GitHub
		cd deepchem
		python setup.py install # Manual install
		@@ -95,7 +95,7 @@ via this installation procedure.
		contact your local sysadmin to work out a custom installation. If your
		version of Linux is recent, then the following command will work:
		```
		pip install tensorflow-gpu==0.12.1
		pip install tensorflow-gpu==1.0.1
		```

		9. `deepchem`: Clone the `deepchem` github repo:
		@@ -509,22 +509,6 @@ Time needed for benchmark test(~20h in total)
		\|kaggle \|MT-NN regression \|2200 \|3200 \|


		## Contributing to DeepChem

		We actively encourage community contributions to DeepChem. The first place to start getting involved is by running our examples locally. Afterwards, we encourage contributors to give a shot to improving our documentation. While we take effort to provide good docs, there's plenty of room for improvement. All docs are hosted on Github, either in this `README.md` file, or in the `docs/` directory.

		Once you've got a sense of how the package works, we encourage the use of Github issues to discuss more complex changes, raise requests for new features or propose changes to the global architecture of DeepChem. Once consensus is reached on the issue, please submit a PR with proposed modifications. All contributed code to DeepChem will be reviewed by a member of the DeepChem team, so please make sure your code style and documentation style match our guidelines!

		### Code Style Guidelines
		DeepChem uses [yapf](https://github.com/google/yapf) to autoformat code. We created a git pre-commit hook to make this process easier.

		``` bash
		cp devtools/travis-ci/pre-commit .git/hooks
		pip install yapf==0.16.0
		```

		### Documentation Style Guidelines
		DeepChem uses [NumPy style documentation](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt). Please follow these conventions when documenting code, since we use [Sphinx+Napoleon](http://www.sphinx-doc.org/en/stable/ext/napoleon.html) to automatically generate docs on [deepchem.io](deepchem.io).

		### Gitter
		Join us on gitter at [https://gitter.im/deepchem/Lobby](https://gitter.im/deepchem/Lobby). Probably the easiest place to ask simple questions or float requests for new features.
		@@ -538,3 +522,7 @@ Approaches](http://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00290)

		## About Us
		DeepChem is a package by the [Pande group](https://pande.stanford.edu/) at Stanford. DeepChem was originally created by [Bharath Ramsundar](http://rbharath.github.io/), and has grown through the contributions of a number of undergraduate, graduate, and postdoctoral researchers working with the Pande lab.


		## Version
		1.0.1

deepchem/data/datasets.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -18,7 +18,7 @@ import shutil

		__author__ = "Bharath Ramsundar"
		__copyright__ = "Copyright 2016, Stanford University"
		__license__ = "GPL"
		__license__ = "MIT"


		def sparsify_features(X):

deepchem/data/tests/init.py

+35 −23

Original line number	Diff line number	Diff line
		@@ -7,7 +7,7 @@ from __future__ import unicode_literals

		__author__ = "Bharath Ramsundar"
		__copyright__ = "Copyright 2016, Stanford University"
		__license__ = "GPL"
		__license__ = "MIT"

		import unittest
		import tempfile
		@@ -16,6 +16,7 @@ import shutil
		import numpy as np
		import deepchem as dc


		def load_solubility_data():
		"""Loads solubility dataset"""
		current_dir = os.path.dirname(os.path.abspath(__file__))
		@@ -28,39 +29,45 @@ def load_solubility_data():

		return loader.featurize(input_file)


		def load_butina_data():
		"""Loads solubility dataset"""
		current_dir = os.path.dirname(os.path.abspath(__file__))
		featurizer = dc.feat.CircularFingerprint(size=1024)
		tasks = ["task"]
		# task_type = "regression"
		input_file = os.path.join(current_dir, "../../models/tests/butina_example.csv")
		input_file = os.path.join(current_dir,
		"../../models/tests/butina_example.csv")
		loader = dc.data.CSVLoader(
		tasks=tasks, smiles_field="smiles", featurizer=featurizer)

		return loader.featurize(input_file)


		def load_multitask_data():
		"""Load example multitask data."""
		current_dir = os.path.dirname(os.path.abspath(__file__))
		featurizer = dc.feat.CircularFingerprint(size=1024)
		tasks = ["task0", "task1", "task2", "task3", "task4", "task5", "task6",
		"task7", "task8", "task9", "task10", "task11", "task12",
		"task13", "task14", "task15", "task16"]
		input_file = os.path.join(
		current_dir, "../../models/tests/multitask_example.csv")
		tasks = [
		"task0", "task1", "task2", "task3", "task4", "task5", "task6", "task7",
		"task8", "task9", "task10", "task11", "task12", "task13", "task14",
		"task15", "task16"
		]
		input_file = os.path.join(current_dir,
		"../../models/tests/multitask_example.csv")
		loader = dc.data.CSVLoader(
		tasks=tasks, smiles_field="smiles", featurizer=featurizer)
		return loader.featurize(input_file)


		def load_classification_data():
		"""Loads classification data from example.csv"""
		current_dir = os.path.dirname(os.path.abspath(__file__))
		featurizer = dc.feat.CircularFingerprint(size=1024)
		tasks = ["outcome"]
		task_type = "classification"
		input_file = os.path.join(
		current_dir, "../../models/tests/example_classification.csv")
		input_file = os.path.join(current_dir,
		"../../models/tests/example_classification.csv")
		loader = dc.data.CSVLoader(
		tasks=tasks, smiles_field="smiles", featurizer=featurizer)
		return loader.featurize(input_file)
		@@ -70,26 +77,30 @@ def load_sparse_multitask_dataset():
		"""Load sparse tox multitask data, sample dataset."""
		current_dir = os.path.dirname(os.path.abspath(__file__))
		featurizer = dc.feat.CircularFingerprint(size=1024)
		tasks = ["task1", "task2", "task3", "task4", "task5", "task6",
		"task7", "task8", "task9"]
		input_file = os.path.join(
		current_dir, "../../models/tests/sparse_multitask_example.csv")
		tasks = [
		"task1", "task2", "task3", "task4", "task5", "task6", "task7", "task8",
		"task9"
		]
		input_file = os.path.join(current_dir,
		"../../models/tests/sparse_multitask_example.csv")
		loader = dc.data.CSVLoader(
		tasks=tasks, smiles_field="smiles", featurizer=featurizer)
		return loader.featurize(input_file)


		def load_feat_multitask_data():
		"""Load example with numerical features, tasks."""
		current_dir = os.path.dirname(os.path.abspath(__file__))
		features = ["feat0", "feat1", "feat2", "feat3", "feat4", "feat5"]
		featurizer = dc.feat.UserDefinedFeaturizer(features)
		tasks = ["task0", "task1", "task2", "task3", "task4", "task5"]
		input_file = os.path.join(
		current_dir, "../../models/tests/feat_multitask_example.csv")
		input_file = os.path.join(current_dir,
		"../../models/tests/feat_multitask_example.csv")
		loader = dc.data.UserCSVLoader(
		tasks=tasks, featurizer=featurizer, id_field="id")
		return loader.featurize(input_file)


		def load_gaussian_cdf_data():
		"""Load example with numbers sampled from Gaussian normal distribution.
		Each feature and task is a column of values that is sampled
		@@ -98,12 +109,13 @@ def load_gaussian_cdf_data():
		features = ["feat0", "feat1"]
		featurizer = dc.feat.UserDefinedFeaturizer(features)
		tasks = ["task0", "task1"]
		input_file = os.path.join(
		current_dir, "../../models/tests/gaussian_cdf_example.csv")
		input_file = os.path.join(current_dir,
		"../../models/tests/gaussian_cdf_example.csv")
		loader = dc.data.UserCSVLoader(
		tasks=tasks, featurizer=featurizer, id_field="id")
		return loader.featurize(input_file)


		def load_unlabelled_data():
		current_dir = os.path.dirname(os.path.abspath(__file__))
		featurizer = dc.feat.CircularFingerprint(size=1024)

Admin message