Commit 891c4840 authored by Nathan Frey's avatar Nathan Frey
Browse files

Expanded docstrings and added graph unittest

parent 99fe8027
Loading
Loading
Loading
Loading
+3 −1
Original line number Diff line number Diff line
@@ -47,12 +47,14 @@ DeepChem has a number of "soft" requirements. These are packages which are neede

- [BioPython](https://biopython.org/wiki/Documentation)
- [OpenAI Gym](https://gym.openai.com/)
- [matminer](https://hackingmaterials.lbl.gov/matminer/)
- [MDTraj](http://mdtraj.org/)
- [NetworkX](https://networkx.github.io/documentation/stable/index.html)
- [OpenMM](http://openmm.org/)
- [PDBFixer](https://github.com/pandegroup/pdbfixer)
- [Pillow](https://pypi.org/project/Pillow/)
- [pyGPGO](https://pygpgo.readthedocs.io/en/latest/)
- [Pymatgen](https://pymatgen.org/)
- [PyTorch](https://pytorch.org/)
- [RDKit](http://www.rdkit.org/docs/Install.html)
- [simdna](https://github.com/kundajelab/simdna)
@@ -209,7 +211,7 @@ sudo apt-get install -y libxrender-dev

## Getting Started

The DeepChem project maintains an extensive colelction of [tutorials](https://github.com/deepchem/deepchem/tree/master/examples/tutorials). All tutorials are designed to be run on Google colab (or locally if you prefer). Tutorials are arranged in a suggested learning sequence which will take you from beginner to proficient at molecular machine learning and computational biology more broadly.
The DeepChem project maintains an extensive collection of [tutorials](https://github.com/deepchem/deepchem/tree/master/examples/tutorials). All tutorials are designed to be run on Google colab (or locally if you prefer). Tutorials are arranged in a suggested learning sequence which will take you from beginner to proficient at molecular machine learning and computational biology more broadly.

After working through the tutorials, you can also go through other [examples](https://github.com/deepchem/deepchem/tree/master/examples). To apply `deepchem` to a new problem, try starting from one of the existing examples or tutorials and modifying it step by step to work with your new use-case. If you have questions or comments you can raise them on our [gitter](https://gitter.im/deepchem/Lobby).

+43 −8
Original line number Diff line number Diff line
@@ -17,7 +17,7 @@ class ChemicalFingerprint(Featurizer):
  based on elemental stoichiometry. E.g., the average electronegativity
  of atoms in a crystal structure. The chemical fingerprint is a 
  vector of these statistics. For a full list of properties and statistics,
  see ElementProperty(data_source).feature_labels().
  see ``matminer.featurizers.composition.ElementProperty(data_source).feature_labels()``.

  This featurizer requires the optional dependencies pymatgen and
  matminer. It may be useful when only crystal compositions are available
@@ -56,6 +56,12 @@ class ChemicalFingerprint(Featurizer):
    comp : str
      Reduced formula of crystal.

    Returns
    -------
    feats: np.ndarray
      Vector of properties and statistics derived from chemical
      stoichiometry.

    """

    from pymatgen import Composition
@@ -118,7 +124,13 @@ class SineCoulombMatrix(Featurizer):
    Parameters
    ----------
    struct : dict
      pymatgen structure dictionary
      Json-serializable dictionary representation of pymatgen.core.structure
      https://pymatgen.org/pymatgen.core.structure.html

    Returns
    -------
    features: np.ndarray
      2D sine Coulomb matrix, or 1D matrix eigenvalues. 

    """

@@ -136,7 +148,16 @@ class SineCoulombMatrix(Featurizer):

    Parameters
    ----------
    s : pymatgen structure
    s : pymatgen.core.structure
      A periodic crystal composed of a lattice and a sequence of atomic
      sites with 3D coordinates and elements.

    Returns
    -------
    eigs: np.ndarray
      1D matrix eigenvalues. 
    sine_mat: np.ndarray
      2D sine Coulomb matrix.

    """

@@ -167,7 +188,8 @@ class SineCoulombMatrix(Featurizer):
      eigs, _ = np.linalg.eig(sine_mat)
      zeros = np.zeros((self.max_atoms,))
      zeros[:len(eigs)] = eigs
      return zeros
      eigs = zeros
      return eigs
    else:
      sine_mat = pad_array(sine_mat, self.max_atoms)
      return sine_mat
@@ -216,7 +238,13 @@ class StructureGraphFeaturizer(Featurizer):
    Parameters
    ----------
    struct : dict
      pymatgen structure dictionary.
      Json-serializable dictionary representation of pymatgen.core.structure
      https://pymatgen.org/pymatgen.core.structure.html

    Returns
    -------
    feats: tuple
      atomic numbers, nodes, and edges in networkx.classes.multidigraph.MultiDiGraph format.

    """

@@ -235,7 +263,14 @@ class StructureGraphFeaturizer(Featurizer):

    Parameters
    ----------
    struct : pymatgen structure
    struct : pymatgen.core.structure
      A periodic crystal composed of a lattice and a sequence of atomic
      sites with 3D coordinates and elements.

    Returns
    -------
    feats: tuple
      atomic numbers, nodes, and edges in networkx.classes.multidigraph.MultiDiGraph format.
    
    """

@@ -244,7 +279,7 @@ class StructureGraphFeaturizer(Featurizer):
    atom_features = np.array([site.specie.Z for site in struct], dtype='int32')

    sg = StructureGraph.with_local_env_strategy(struct, self.strategy)
    nodes = np.asarray(sg.graph.nodes)
    edges = np.asarray(sg.graph.edges)
    nodes = np.array(list(sg.graph.nodes))
    edges = np.array(list(sg.graph.edges))

    return (atom_features, nodes, edges)
+12 −0
Original line number Diff line number Diff line
@@ -68,3 +68,15 @@ class TestMaterialFeaturizers(unittest.TestCase):

    assert len(features) == 1
    assert np.isclose(features[0], 1244, atol=.5)

  def testSGF(self):
    """
    Test StructureGraphFeaturizer.
    """

    featurizer = StructureGraphFeaturizer()
    features = featurizer.featurize([self.struct_dict])

    assert len(features[0]) == 3
    assert features[0][0] == 26
    assert len(features[0][2]) == 6
+9 −0
Original line number Diff line number Diff line
@@ -116,6 +116,15 @@ AtomConvFeaturizer
.. autoclass:: deepchem.feat.NeighborListComplexAtomicCoordinates
  :members:

MaterialsFeaturizers
-------------------

Materials Featurizers are those that work with datasets of inorganic crystals.
These featurizers operate on chemical compositions (e.g. "MoS2"), or on a
lattice and 3D coordinates that specify a periodic crystal structure. They
should be applied on systems that have periodic boundary conditions. Materials
featurizers are not designed to work with molecules. 

ChemicalFingerprint
^^^^^^^^^^^^^^^^^^^

+2 −1
Original line number Diff line number Diff line
@@ -33,7 +33,8 @@ conda install -y -q -c deepchem -c rdkit -c conda-forge -c omnia \
    py-xgboost \
    rdkit \
    simdna \
    pymatgen \
    pytest \
    pytest-cov \
    flaky
yes | pip install -U tensorflow tensorflow-probability
yes | pip install -U matminer tensorflow tensorflow-probability