Commit 565fb2f9 authored by Bharath Ramsundar's avatar Bharath Ramsundar
Browse files

Changes

parent 4cbaac01
Loading
Loading
Loading
Loading
+4 −0
Original line number Diff line number Diff line
# Data Loading Examples

The examples in this directory highlight a number of ways to
load datasets into DeepChem for downstream analysis.
+11 −0
Original line number Diff line number Diff line
Compound ID,log-solubility,smiles
Amigdalin,0.9740000000000001,OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(O)C3O 
Fenfuram,2.885,Cc1occc1C(=O)Nc2ccccc2
citral,2.5789999999999997,CC(C)=CCCC(C)=CC(=O)
Picene,6.617999999999999,c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43
Thiophene,2.2319999999999998,c1ccsc1
benzothiazole,2.733,c2ccc1scnc1c2 
"2,2,4,6,6'-PCB",6.545,Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl
Estradiol,4.138,CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O
Dieldrin,4.533,ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl
Rotenone,5.246,COc5cc4OCC3Oc2c1CC(Oc1ccc2C(=O)C3c4cc5OC)C(C)=C 
+1165 −0

File added.

Preview size limit exceeded, changes collapsed.

+22 −0
Original line number Diff line number Diff line
# This example shows how to use Pandas to load data directly
# without using a CSVLoader object. This may be useful if you
# want the flexibility of processing your data with Pandas
# directly.
import pandas as pd
import deepchem as dc

df = pd.read_csv("example.csv")
print("Original data loaded as DataFrame:")
print(df)

featurizer = dc.feat.CircularFingerprint(size=16)
features = featurizer.featurize(df["smiles"])
dataset = dc.data.NumpyDataset(X=features, y=df["log-solubility"], ids=df["Compound ID"])

print("Data converted into DeepChem Dataset")
print(dataset)

# Now let's convert from a dataset back to a pandas dataframe
converted_df = dataset.to_dataframe()
print("Data converted back into DataFrame:")
print(converted_df)
+3 −0
Original line number Diff line number Diff line
# Dataset Examples

This folder countains examples of using DeepChem datasets to do things.
Loading