Commit b7d4d7f6 authored by nd-02110114's avatar nd-02110114
Browse files

🚧 wip fix api references

parent d1667d74
Loading
Loading
Loading
Loading
+26 −0
Original line number Diff line number Diff line
Data Classes
============
DeepChem featurizers often transform members into "data classes". These are
classes that hold all the information needed to train a model on that data
point. Models then transform these into the tensors for training in their
:code:`default_generator` methods.

Graph Convolutions
------------------

These classes document the data classes for graph convolutions. We plan to simplify these classes into a joint data representation for all graph convolutions in a future version of DeepChem, so these APIs may not remain stable.

.. autoclass:: deepchem.feat.mol_graphs.ConvMol
  :members:

.. autoclass:: deepchem.feat.mol_graphs.MultiConvMol
  :members:

.. autoclass:: deepchem.feat.mol_graphs.WeaveMol
  :members:

.. autoclass:: deepchem.feat.graph_data.GraphData
  :members:

.. autoclass:: deepchem.feat.graph_data.BatchGraphData
  :members:
+62 −0
Original line number Diff line number Diff line
Data Loaders
============

Processing large amounts of input data to construct a :code:`dc.data.Dataset` object can require some amount of hacking. To simplify this process for you, you can use the :code:`dc.data.DataLoader` classes. These classes provide utilities for you to load and process large amounts of data.


DataLoader
----------

.. autoclass:: deepchem.data.DataLoader
  :members:

CSVLoader
^^^^^^^^^

.. autoclass:: deepchem.data.CSVLoader
  :members:

UserCSVLoader
^^^^^^^^^^^^^

.. autoclass:: deepchem.data.UserCSVLoader
  :members:

JsonLoader
^^^^^^^^^^
JSON is a flexible file format that is human-readable, lightweight, 
and more compact than other open standard formats like XML. JSON files
are similar to python dictionaries of key-value pairs. All keys must
be strings, but values can be any of (string, number, object, array,
boolean, or null), so the format is more flexible than CSV. JSON is
used for describing structured data and to serialize objects. It is
conveniently used to read/write Pandas dataframes with the
`pandas.read_json` and `pandas.write_json` methods.

.. autoclass:: deepchem.data.JsonLoader
  :members:

FASTALoader
^^^^^^^^^^^

.. autoclass:: deepchem.data.FASTALoader
  :members:

ImageLoader
^^^^^^^^^^^

.. autoclass:: deepchem.data.ImageLoader
  :members:

SDFLoader
^^^^^^^^^

.. autoclass:: deepchem.data.SDFLoader
  :members:

InMemoryLoader
^^^^^^^^^^^^^^
The :code:`dc.data.InMemoryLoader` is designed to facilitate the processing of large datasets where you already hold the raw data in-memory (say in a pandas dataframe).

.. autoclass:: deepchem.data.InMemoryLoader
  :members:
+41 −0
Original line number Diff line number Diff line
Datasets
========

DeepChem :code:`dc.data.Dataset` objects are one of the core building blocks of DeepChem programs. :code:`Dataset` objects hold representations of data for machine learning and are widely used throughout DeepChem.

Dataset
-------
The :code:`dc.data.Dataset` class is the abstract parent class for all
datasets. This class should never be directly initialized, but
contains a number of useful method implementations.

The goal of the :code:`Dataset` class is to be maximally interoperable with other common representations of machine learning datasets. For this reason we provide interconversion methods mapping from :code:`Dataset` objects to pandas dataframes, tensorflow Datasets, and PyTorch datasets.

.. autoclass:: deepchem.data.Dataset
  :members:

NumpyDataset
------------
The :code:`dc.data.NumpyDataset` class provides an in-memory implementation of the abstract :code:`Dataset` which stores its data in :code:`numpy.ndarray` objects.

.. autoclass:: deepchem.data.NumpyDataset
  :members:

DiskDataset
-----------
The :code:`dc.data.DiskDataset` class allows for the storage of larger
datasets on disk. Each :code:`DiskDataset` is associated with a
directory in which it writes its contents to disk. Note that a
:code:`DiskDataset` can be very large, so some of the utility methods
to access fields of a :code:`Dataset` can be prohibitively expensive.

.. autoclass:: deepchem.data.DiskDataset
  :members:

ImageDataset
------------
The :code:`dc.data.ImageDataset` class is optimized to allow for convenient processing of image based datasets.

.. autoclass:: deepchem.data.ImageDataset
  :members:
+74 −0
Original line number Diff line number Diff line
Docking
=======
Thanks to advances in biophysics, we are often able to find the
structure of proteins from experimental techniques like Cryo-EM or
X-ray crystallography. These structures can be powerful aides in
designing small molecules. The technique of Molecular docking performs
geometric calculations to find a "binding pose" with the small
molecule interacting with the protein in question in a suitable
binding pocket (that is, a region on the protein which has a groove in
which the small molecule can rest). For more information about
docking, check out the Autodock Vina paper:

Trott, Oleg, and Arthur J. Olson. "AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading." Journal of computational chemistry 31.2 (2010): 455-461.

Binding Pocket Discovery
------------------------

DeepChem has some utilities to help find binding pockets on proteins
automatically. For now, these utilities are simple, but we will
improve these in future versions of DeepChem.

.. autoclass:: deepchem.dock.binding_pocket.BindingPocketFinder
  :members:

.. autoclass:: deepchem.dock.binding_pocket.ConvexHullPocketFinder
  :members:

Pose Generation
---------------
Pose generation is the task of finding a "pose", that is a geometric
configuration of a small molecule interacting with a protein. Pose
generation is a complex process, so for now DeepChem relies on
external software to perform pose generation. This software is invoked
and installed under the hood.

.. autoclass:: deepchem.dock.pose_generation.PoseGenerator
  :members:

.. autoclass:: deepchem.dock.pose_generation.VinaPoseGenerator
  :members:

Docking
-------
The :code:`dc.dock.docking` module provides a generic docking
implementation that depends on provide pose generation and pose
scoring utilities to perform docking. This implementation is generic.

.. autoclass:: deepchem.dock.docking.Docker
  :members:


Pose Scoring
------------
This module contains some utilities for computing docking scoring
functions directly in Python. For now, support for custom pose scoring
is limited.

.. autofunction:: deepchem.dock.pose_scoring.pairwise_distances

.. autofunction:: deepchem.dock.pose_scoring.cutoff_filter

.. autofunction:: deepchem.dock.pose_scoring.vina_nonlinearity

.. autofunction:: deepchem.dock.pose_scoring.vina_repulsion

.. autofunction:: deepchem.dock.pose_scoring.vina_hydrophobic

.. autofunction:: deepchem.dock.pose_scoring.vina_hbond

.. autofunction:: deepchem.dock.pose_scoring.vina_gaussian_first

.. autofunction:: deepchem.dock.pose_scoring.vina_gaussian_second

.. autofunction:: deepchem.dock.pose_scoring.vina_energy_term
+242 −0
Original line number Diff line number Diff line
Featurizers
===========

DeepChem contains an extensive collection of featurizers. If you
haven't run into this terminology before, a "featurizer" is chunk of
code which transforms raw input data into a processed form suitable
for machine learning. Machine learning methods often need data to be
pre-chewed for them to process. Think of this like a mama penguin
chewing up food so the baby penguin can digest it easily.


Now if you've watched a few introductory deep learning lectures, you
might ask, why do we need something like a featurizer? Isn't part of
the promise of deep learning that we can learn patterns directly from
raw data?

Unfortunately it turns out that deep learning techniques need
featurizers just like normal machine learning methods do. Arguably,
they are less dependent on sophisticated featurizers and more capable
of learning sophisticated patterns from simpler data. But
nevertheless, deep learning systems can't simply chew up raw files.
For this reason, :code:`deepchem` provides an extensive collection of
featurization methods which we will review on this page.

Featurizer
----------

The :code:`dc.feat.Featurizer` class is the abstract parent class for all featurizers.

.. autoclass:: deepchem.feat.Featurizer
  :members:

MolecularFeaturizer
-------------------

Molecular Featurizers are those that work with datasets of molecules.

.. autoclass:: deepchem.feat.MolecularFeaturizer
  :members:

Here are some constants that are used by the graph convolutional featurizers for molecules.

.. autoclass:: deepchem.feat.graph_features.GraphConvConstants
  :members:
  :undoc-members:

There are a number of helper methods used by the graph convolutional classes which we document here.

.. autofunction:: deepchem.feat.graph_features.one_of_k_encoding

.. autofunction:: deepchem.feat.graph_features.one_of_k_encoding_unk

.. autofunction:: deepchem.feat.graph_features.get_intervals

.. autofunction:: deepchem.feat.graph_features.safe_index

.. autofunction:: deepchem.feat.graph_features.get_feature_list

.. autofunction:: deepchem.feat.graph_features.features_to_id

.. autofunction:: deepchem.feat.graph_features.id_to_features

.. autofunction:: deepchem.feat.graph_features.atom_to_id

This function helps compute distances between atoms from a given base atom.

.. autofunction:: deepchem.feat.graph_features.find_distance

This function is important and computes per-atom feature vectors used by
graph convolutional featurizers. 

.. autofunction:: deepchem.feat.graph_features.atom_features

This function computes the bond features used by graph convolutional
featurizers.

.. autofunction:: deepchem.feat.graph_features.bond_features

This function computes atom-atom features (for atom pairs which may not have bonds between them.)

.. autofunction:: deepchem.feat.graph_features.pair_features

ConvMolFeaturizer
^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.ConvMolFeaturizer
  :members:

WeaveFeaturizer
^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.WeaveFeaturizer
  :members:

CircularFingerprint
^^^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.CircularFingerprint
  :members:

Mol2VecFingerprint
^^^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.Mol2VecFingerprint
  :members:

RDKitDescriptors
^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.RDKitDescriptors
  :members:

MordredDescriptors
^^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.MordredDescriptors
  :members:

CoulombMatrix
^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.CoulombMatrix
  :members:

CoulombMatrixEig
^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.CoulombMatrixEig
  :members:

AtomCoordinates
^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.AtomicCoordinates
  :members:

SmilesToSeq
^^^^^^^^^^^

.. autoclass:: deepchem.feat.SmilesToSeq
  :members:

SmilesToImage
^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.SmilesToImage
  :members:

OneHotFeaturizer
^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.OneHotFeaturizer
  :members:

ComplexFeaturizer
-----------------

The :code:`dc.feat.ComplexFeaturizer` class is the abstract parent class for all featurizers that work with three dimensional molecular complexes. 


.. autoclass:: deepchem.feat.ComplexFeaturizer
  :members:

RdkitGridFeaturizer
^^^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.RdkitGridFeaturizer
  :members:

AtomConvFeaturizer
^^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.NeighborListComplexAtomicCoordinates
  :members:

MaterialStructureFeaturizer
---------------------------

Material Structure Featurizers are those that work with datasets of crystals with
periodic boundary conditions. For inorganic crystal structures, these
featurizers operate on pymatgen.Structure objects, which include a
lattice and 3D coordinates that specify a periodic crystal structure. 
They should be applied on systems that have periodic boundary conditions.
Structure featurizers are not designed to work with molecules. 

.. autoclass:: deepchem.feat.MaterialStructureFeaturizer
  :members:

SineCoulombMatrix
^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.SineCoulombMatrix
  :members:

CGCNNFeaturizer
^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.CGCNNFeaturizer
  :members:

MaterialCompositionFeaturizer
-----------------------------

Material Composition Featurizers are those that work with datasets of crystal
compositions with periodic boundary conditions. 
For inorganic crystal structures, these featurizers operate on chemical
compositions (e.g. "MoS2"). They should be applied on systems that have
periodic boundary conditions. Composition featurizers are not designed 
to work with molecules. 

.. autoclass:: deepchem.feat.MaterialCompositionFeaturizer
  :members:

ElementPropertyFingerprint
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: deepchem.feat.ElementPropertyFingerprint
  :members:

BindingPocketFeaturizer
-----------------------

.. autoclass:: deepchem.feat.BindingPocketFeaturizer
  :members:

UserDefinedFeaturizer
---------------------

.. autoclass:: deepchem.feat.UserDefinedFeaturizer
  :members:

BPSymmetryFunctionInput
-----------------------

.. autoclass:: deepchem.feat.BPSymmetryFunctionInput
  :members:

RawFeaturizer
-------------

.. autoclass:: deepchem.feat.RawFeaturizer
  :members:
Loading