Commit 73a5c029 authored by Bharath Ramsundar's avatar Bharath Ramsundar
Browse files

splitters/transformers

parent 3de71962
Loading
Loading
Loading
Loading
+2 −1
Original line number Diff line number Diff line
@@ -84,6 +84,7 @@ for name in ['sphinx.ext.linkcode', 'numpydoc.linkcode']:
  else:
    print("NOTE: linkcode extension not found -- no links to source generated")


def linkcode_resolve(domain, info):
  """
  Determine the URL corresponding to Python object
+3 −1
Original line number Diff line number Diff line
@@ -5,7 +5,9 @@ DeepChem :code:`dc.data.Dataset` objects are one of the core building blocks of

Dataset
-------
The :code:`dc.data.Dataset` class is the abstract parent clss for all datasets. This class should never be directly initialized, but contains a number of useful method implementations.
The :code:`dc.data.Dataset` class is the abstract parent class for all
datasets. This class should never be directly initialized, but
contains a number of useful method implementations.

The goal of the :code:`Dataset` class is to be maximally interoperable with other common representations of machine learning datasets. For this reason we provide interconversion methods mapping from :code:`Dataset` objects to pandas dataframes, tensorflow Datasets, and PyTorch datasets.

+17 −4
Original line number Diff line number Diff line
@@ -18,9 +18,13 @@ The DeepChem Project
What is DeepChem?
-----------------

The DeepChem project aims to build high quality tools to democratize the use of deep learning in the sciences. The core `DeepChem Repo`_ serves as a monorepo that organizes the DeepChem suite of scientific tools. As the project matures, smaller more focused tool will be surfaced in more targeted repos.
The DeepChem project aims to build high quality tools to democratize
the use of deep learning in the sciences. The origin of DeepChem
focused on applications of deep learning to chemistry, but the project
has slowly evolved past its roots to broader applications of deep
learning to the sciences.

DeepChem is primarily developed in Python, but we are experimenting with adding support for other languages.
The core `DeepChem Repo`_ serves as a monorepo that organizes the DeepChem suite of scientific tools. As the project matures, smaller more focused tool will be surfaced in more targeted repos. DeepChem is primarily developed in Python, but we are experimenting with adding support for other languages.

What are some of the things you can use DeepChem to do? Here's a few examples:

@@ -29,12 +33,17 @@ What are some of the things you can use DeepChem to do? Here's a few examples:
- Predict physical properties of simple materials
- Analyze protein structures and extract useful descriptors
- Count the number of cells in a microscopy image
- More coming soon...

We should clarify one thing up front though. DeepChem is a machine
learning library, so it gives you the tools to solve each of the
applications mentioned above yourself. DeepChem may or may not have
prebaked models which can solve these problems out of the box.

Over time, we hope to grow the set of scientific applications DeepChem
can address. This means we need lots of help! If you're a scientist
who's interested in open source, please pitch on building DeepChem.

Quick Start
-----------

@@ -47,6 +56,7 @@ If you'd like to install DeepChem locally, we recommend using
DeepChem with the one-liner

.. code-block:: bash

    conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0

Then open your python and try running.
@@ -76,7 +86,8 @@ DeepChem developers.

That said, we would very much appreciate a citation if you find our tools useful. You can cite DeepChem with the following reference.

.. highlight:: guess
.. code-block:: guess

  @book{Ramsundar-et-al-2019,
      title={Deep Learning for the Life Sciences},
      author={Bharath Ramsundar and Peter Eastman and Patrick Walters and Vijay Pande and Karl Leswing and Zhenqin Wu},
@@ -96,7 +107,7 @@ discussions about research, development or any general questions. If you'd like
.. _`on GitHub`: https://github.com/deepchem/deepchem
.. _`Gitter`: https://gitter.im/deepchem/Lobby

.. important:: Join our `community gitter <https://forms.gle/9TSdDYUgxYs8SA9e8>`_ to discuss DeepChem. Sign up for our `forums <https://forum.deepchem.io/>`_ to ralk about research, development, and general questions. 
.. important:: Join our `community gitter <https://forms.gle/9TSdDYUgxYs8SA9e8>`_ to discuss DeepChem. Sign up for our `forums <https://forum.deepchem.io/>`_ to talk about research, development, and general questions. 

.. toctree::
   :maxdepth: 4
@@ -108,5 +119,7 @@ discussions about research, development or any general questions. If you'd like
   Datasets <datasets>
   Data Loaders <dataloaders>
   Featurizers <featurizers>
   Splitters <splitters>
   Transformers <transformers>
   Models <models>
   Introduction to Keras <keras>
+1 −0
Original line number Diff line number Diff line
@@ -15,6 +15,7 @@ If you'd like to install DeepChem locally, we recommend using
DeepChem with the one-liner

.. code-block:: bash

    conda install -y -c deepchem -c rdkit -c conda-forge -c omnia deepchem-gpu=2.3.0

Then open your python and try running.

docs/splitters.rst

0 → 100644
+109 −0
Original line number Diff line number Diff line
Splitters
=========
DeepChem :code:`dc.splits.Splitter` objects are a tool to meaningfully
split DeepChem datasets for machine learning testing. The core idea is
that when evaluating a machine learning model, it's useful to creating
training, validation and test splits of your source data. The training
split is used to train models, the validatation is used to benchmark
different model architectures. The test is ideally held out till the
very end when it's used to gauge a final estimate of the model's
performance.

The :code:`dc.splits` module contains a collection of scientifically
aware splitters. In many cases, we want to evaluate scientific deep
learning models more rigorously than standard deep models since we're
looking for the ability to generalize to new domains. Some of the
implemented splitters here may help.

Splitter
--------
The :code:`dc.splits.Splitter` class is the abstract parent class for
all splitters. This class should never be directly instantiated.

.. autoclass:: deepchem.splits.Splitter
  :members:

RandomSplitter
--------------

.. autoclass:: deepchem.splits.RandomSplitter
  :members:

IndexSplitter
-------------

.. autoclass:: deepchem.splits.IndexSplitter
  :members:

IndiceSplitter
--------------

.. autoclass:: deepchem.splits.IndiceSplitter
  :members:

SpecifiedSplitter
-----------------

.. autoclass:: deepchem.splits.SpecifiedSplitter
  :members:

SpecifiedIndexSplitter
----------------------

.. autoclass:: deepchem.splits.SpecifiedIndexSplitter
  :members:


RandomGroupSplitter
-------------------

.. autoclass:: deepchem.splits.RandomGroupSplitter
  :members:

RandomStratifiedSplitter
-------------------

.. autoclass:: deepchem.splits.RandomStratifiedSplitter
  :members:

SingletaskStratifiedSplitter
----------------------------

.. autoclass:: deepchem.splits.SingletaskStratifiedSplitter
  :members:

MolecularWeightSplitter
-----------------------

.. autoclass:: deepchem.splits.MolecularWeightSplitter
  :members:

MaxMinSplitter
--------------

.. autoclass:: deepchem.splits.MaxMinSplitter
  :members:

ButinaSplitter
--------------

.. autoclass:: deepchem.splits.ButinaSplitter
  :members:

ScaffoldSplitter
----------------

.. autoclass:: deepchem.splits.ScaffoldSplitter
  :members:

FingeprintSplitter
----------------

.. autoclass:: deepchem.splits.FingerprintSplitter
  :members:

TimeSplitterPDBbind
-------------------

.. autoclass:: deepchem.splits.TimeSplitterPDBbind
  :members:
Loading