Unverified Commit c0821eff authored by Bharath Ramsundar's avatar Bharath Ramsundar Committed by GitHub
Browse files

Merge pull request #1877 from deepchem/examples_first

First Batch of Examples Changes
parents 596c2142 72e70108
Loading
Loading
Loading
Loading
+7 −0
Original line number Diff line number Diff line
# Data Loading Examples

The examples in this directory highlight a number of ways to
load datasets into DeepChem for downstream analysis: 

- `pandas_csv.py` shows how to directly load a dataset from a CSV file without using a `DataLoader`. 
- `sdf_load.py` shows how to load a dataset from a sdf file using `SDFLoader`. 
+11 −0
Original line number Diff line number Diff line
Compound ID,log-solubility,smiles
Amigdalin,0.9740000000000001,OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(O)C3O 
Fenfuram,2.885,Cc1occc1C(=O)Nc2ccccc2
citral,2.5789999999999997,CC(C)=CCCC(C)=CC(=O)
Picene,6.617999999999999,c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43
Thiophene,2.2319999999999998,c1ccsc1
benzothiazole,2.733,c2ccc1scnc1c2 
"2,2,4,6,6'-PCB",6.545,Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl
Estradiol,4.138,CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O
Dieldrin,4.533,ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl
Rotenone,5.246,COc5cc4OCC3Oc2c1CC(Oc1ccc2C(=O)C3c4cc5OC)C(C)=C 
+1165 −0

File added.

Preview size limit exceeded, changes collapsed.

+25 −0
Original line number Diff line number Diff line
# This example shows how to use Pandas to load data directly
# without using a CSVLoader object. This may be useful if you
# want the flexibility of processing your data with Pandas
# directly.
import pandas as pd
import deepchem as dc
from rdkit import Chem

df = pd.read_csv("example.csv")
print("Original data loaded as DataFrame:")
print(df)

featurizer = dc.feat.CircularFingerprint(size=16)
mols = [Chem.MolFromSmiles(smiles) for smiles in df["smiles"]]
features = featurizer.featurize(mols)
dataset = dc.data.NumpyDataset(
    X=features, y=df["log-solubility"], ids=df["Compound ID"])

print("Data converted into DeepChem Dataset")
print(dataset)

# Now let's convert from a dataset back to a pandas dataframe
converted_df = dataset.to_dataframe()
print("Data converted back into DataFrame:")
print(converted_df)
+6 −0
Original line number Diff line number Diff line
# This example shows how to load data from a SDF file into DeepChem. The data in this SDF file is stored in field "LogP(RRCK)"
import deepchem as dc

featurizer = dc.feat.CircularFingerprint(size=16)
loader = dc.data.SDFLoader(["LogP(RRCK)"], featurizer=featurizer, sanitize=True)
dataset = loader.featurize("membrane_permeability.sdf")
Loading