update website (34262936) · Commits · github_fork / DAseq2

README.md

+1 −1

Original line number	Diff line number	Diff line
		# DA-seq

		## Introduction
		DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, "Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data", preprint available [here](https://www.biorxiv.org/content/10.1101/711929v3). Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (biological states, conditions, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA neighborhoods; then groups these cells into distinct DA cell subpopulations.
		DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, "Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data", available [here](https://www.biorxiv.org/content/10.1101/711929v3). Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (biological states, conditions, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA neighborhoods; then groups these cells into distinct DA cell subpopulations.

		[This](https://github.com/KlugerLab/DAseq) repository contains the DA-seq package.

docs/index.html

+3 −3

Original line number	Diff line number	Diff line
		@@ -75,13 +75,13 @@

		</header><div class="row">
		<div class="contents col-md-9">
		<div id="da-seq-detecting-regions-of-differential-abundance-between-scrna-seq-datasets" class="section level1">
		<div id="da-seq" class="section level1">
		<div class="page-header"><h1 class="hasAnchor">
		<a href="#da-seq-detecting-regions-of-differential-abundance-between-scrna-seq-datasets" class="anchor"></a>DA-seq (Detecting regions of differential abundance between scRNA-seq datasets)</h1></div>
		<a href="#da-seq" class="anchor"></a>DA-seq</h1></div>
		<div id="introduction" class="section level2">
		<h2 class="hasAnchor">
		<a href="#introduction" class="anchor"></a>Introduction</h2>
		<p>DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, “Detecting regions of differential abundance between scRNA-Seq datasets” available <a href="https://www.biorxiv.org/content/10.1101/711929v2">here</a>. Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (cell states, condition, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA areas; then groups these cells into distinct DA regions.</p>
		<p>DA-seq is a method to detect cell subpopulations with differential abundance between single cell RNA-seq (scRNA-seq) datasets from different samples, described in the preprint, “Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data”, available <a href="https://www.biorxiv.org/content/10.1101/711929v3">here</a>. Given a low dimensional transformation, for example principal component analysis (PCA), of the merged gene expression matrices from different samples (biological states, conditions, etc.), DA-seq first computes a score vector for each cell to represent the DA behavior in the neighborhood to select cells in the most DA neighborhoods; then groups these cells into distinct DA cell subpopulations.</p>
		<p><a href="https://github.com/KlugerLab/DAseq">This</a> repository contains the DA-seq package.</p>
		</div>
		<div id="r-dependencies" class="section level2">

inst/DA_logit.py

deleted100644 → 0

+0 −141

Original line number	Diff line number	Diff line
		import os, sys
		import numpy as np
		import pandas as pd

		from keras import backend as K
		from keras.models import Model
		from keras.layers import Input, Dense
		from keras.activations import relu
		from keras.callbacks import EarlyStopping

		def k_fold_split(x, p, k):
		return [x[p_] for p_ in np.split(p, k)]

		def rev_split(xs, p):
		n = len(p)
		inv_p = np.empty(n)
		inv_p[p] = np.arange(n)
		inv_p = inv_p.astype(int)
		return np.concatenate(xs)[inv_p]

		def make_splits(n, k):
		sizes = int(n / k)
		splits = [sizes] * (k - 1) + [sizes + n % k]
		for i in range(k - 1):
		splits[i+1] = splits[i] + splits[i+1]
		return splits[:-1]

		def k_fold_predict(data, labels, k_folds, es_patience=10, architecture=[8]*8, activations='relu', end_activation='sigmoid'):
		# os.environ['CUDA_VISIBLE_DEVICES'] = '4'
		# build layers

		layers = []

		for width in architecture:

		layers.append(Dense(width, activation=activations))

		layers.append(Dense(1, activation=end_activation))



		# build neural network

		input_shape = data.shape[1:]

		x = x0 = Input(shape=input_shape)

		for layer in layers:

		x = layer(x)



		model = Model(inputs=x0, outputs=x)



		y_tests = []



		p = np.random.permutation(len(data))



		for i in range(k_folds):

		val_idx = (i - 1) % k_folds

		test_idx = i

		k_folds_ = make_splits(len(data), k_folds)

		x_full, y_full = k_fold_split(data, p, k_folds_), k_fold_split(labels, p, k_folds_)

		x_val_, y_val_ = x_full[val_idx], y_full[val_idx]

		x_test_, y_test_ = x_full[test_idx], y_full[test_idx]
		# remove VALIDATION AND TEST SETS from the training set
		if k_folds > 2:
		del x_full[val_idx]
		del y_full[val_idx]
		# if val_idx came before test_idx, we have to remove the (test_idx - 1)th element (as we have already deleted val_idx so the index corresponding to the test set has changed)
		if k_folds > 1 and val_idx < test_idx:
		del x_full[(test_idx-1)]
		del y_full[(test_idx-1)]
		elif k_folds > 1:
		# otherwise, simply remove the (test_idx)th element
		del x_full[test_idx]
		del y_full[test_idx]

		x_train_, y_train_ = np.concatenate(x_full), np.concatenate(y_full)

		model.compile('adam', loss='binary_crossentropy', metrics=['acc'])



		epochs = 1000

		batch_size = len(x_train_)



		model.fit(

		x=x_train_,

		y=y_train_,

		epochs=epochs,

		batch_size=batch_size,

		validation_data=[x_val_, y_val_],

		callbacks=[EarlyStopping(patience=es_patience)],

		verbose=0)



		# print("Finished {} / {} folds.".format(i + 1, k_folds))



		y_tests.append(model.predict(x_test_).reshape((-1,)))



		y_full = rev_split(y_tests, p)



		return y_full



		def k_fold_predict_linear(data, labels, k_folds, es_patience):
		return k_fold_predict(data, labels, k_folds, es_patience=es_patience, architecture=[], activations=None, end_activation='sigmoid')

Admin message