Commit 34262936 authored by Jun Zhao's avatar Jun Zhao
Browse files

update website

parent d3dfd543
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
# DA-seq

## Introduction
DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, "Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data", preprint available [here](https://www.biorxiv.org/content/10.1101/711929v3). Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (biological states, conditions, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA neighborhoods; then groups these cells into distinct DA cell subpopulations.
DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, "Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data", available [here](https://www.biorxiv.org/content/10.1101/711929v3). Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (biological states, conditions, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA neighborhoods; then groups these cells into distinct DA cell subpopulations.

[This](https://github.com/KlugerLab/DAseq) repository contains the DA-seq package.

+3 −3
Original line number Diff line number Diff line
@@ -75,13 +75,13 @@

      </header><div class="row">
  <div class="contents col-md-9">
<div id="da-seq-detecting-regions-of-differential-abundance-between-scrna-seq-datasets" class="section level1">
<div id="da-seq" class="section level1">
<div class="page-header"><h1 class="hasAnchor">
<a href="#da-seq-detecting-regions-of-differential-abundance-between-scrna-seq-datasets" class="anchor"></a>DA-seq (Detecting regions of differential abundance between scRNA-seq datasets)</h1></div>
<a href="#da-seq" class="anchor"></a>DA-seq</h1></div>
<div id="introduction" class="section level2">
<h2 class="hasAnchor">
<a href="#introduction" class="anchor"></a>Introduction</h2>
<p>DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, “Detecting regions of differential abundance between scRNA-Seq datasets” available <a href="https://www.biorxiv.org/content/10.1101/711929v2">here</a>. Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (cell states, condition, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA areas; then groups these cells into distinct DA regions.</p>
<p>DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, “Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data”, available <a href="https://www.biorxiv.org/content/10.1101/711929v3">here</a>. Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (biological states, conditions, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA neighborhoods; then groups these cells into distinct DA cell subpopulations.</p>
<p><a href="https://github.com/KlugerLab/DAseq">This</a> repository contains the DA-seq package.</p>
</div>
<div id="r-dependencies" class="section level2">

inst/DA_logit.py

deleted100644 → 0
+0 −141
Original line number Diff line number Diff line
import os, sys
import numpy as np
import pandas as pd

from keras import backend as K
from keras.models import Model
from keras.layers import Input, Dense
from keras.activations import relu
from keras.callbacks import EarlyStopping

def k_fold_split(x, p, k):
    return [x[p_] for p_ in np.split(p, k)]

def rev_split(xs, p):
    n = len(p)
    inv_p = np.empty(n)
    inv_p[p] = np.arange(n)
    inv_p = inv_p.astype(int)
    return np.concatenate(xs)[inv_p]

def make_splits(n, k):
    sizes = int(n / k)
    splits = [sizes] * (k - 1) + [sizes + n % k]
    for i in range(k - 1):
        splits[i+1] = splits[i] + splits[i+1]
    return splits[:-1]

def k_fold_predict(data, labels, k_folds, es_patience=10, architecture=[8]*8, activations='relu', end_activation='sigmoid'):
    # os.environ['CUDA_VISIBLE_DEVICES'] = '4'
    # build layers

    layers = []

    for width in architecture:

        layers.append(Dense(width, activation=activations))

    layers.append(Dense(1, activation=end_activation))



    # build neural network

    input_shape = data.shape[1:]

    x = x0 = Input(shape=input_shape)

    for layer in layers:

        x = layer(x)



    model = Model(inputs=x0, outputs=x)



    y_tests = []



    p = np.random.permutation(len(data))



    for i in range(k_folds):

        val_idx = (i - 1) % k_folds

        test_idx = i

        k_folds_ = make_splits(len(data), k_folds)

        x_full, y_full = k_fold_split(data, p, k_folds_), k_fold_split(labels, p, k_folds_)

        x_val_, y_val_ = x_full[val_idx], y_full[val_idx]

        x_test_, y_test_ = x_full[test_idx], y_full[test_idx]
	# remove VALIDATION AND TEST SETS from the training set
        if k_folds > 2:
            del x_full[val_idx]
            del y_full[val_idx]
        # if val_idx came before test_idx, we have to remove the (test_idx - 1)th element (as we have already deleted val_idx so the index corresponding to the test set has changed)
        if k_folds > 1 and val_idx < test_idx:
            del x_full[(test_idx-1)]
            del y_full[(test_idx-1)]
        elif k_folds > 1:
        # otherwise, simply remove the (test_idx)th element
            del x_full[test_idx]
            del y_full[test_idx]

        x_train_, y_train_ = np.concatenate(x_full), np.concatenate(y_full)

        model.compile('adam', loss='binary_crossentropy', metrics=['acc'])



        epochs = 1000

        batch_size = len(x_train_)



        model.fit(

            x=x_train_,

            y=y_train_,

            epochs=epochs,

            batch_size=batch_size,

            validation_data=[x_val_, y_val_],

            callbacks=[EarlyStopping(patience=es_patience)],

            verbose=0)



#        print("Finished {} / {} folds.".format(i + 1, k_folds))



        y_tests.append(model.predict(x_test_).reshape((-1,)))



    y_full = rev_split(y_tests, p)



    return y_full



def k_fold_predict_linear(data, labels, k_folds, es_patience):
    return k_fold_predict(data, labels, k_folds, es_patience=es_patience, architecture=[], activations=None, end_activation='sigmoid')