forget to cast as factor in compute_hash (de0c2af7) · Commits · github_fork / Presto 20250701

R/pseudobulk.R

+1 −1

Original line number	Original line	Diff line number	Diff line
	@@ -3,7 +3,7 @@ compute_hash <- function(data_df, vars_use) {
	base <- 1		base <- 1
	hash <- rep(0, nrow(data_df))		hash <- rep(0, nrow(data_df))
	for (varname in vars_use) {		for (varname in vars_use) {
	vals <- data.frame(data_df)[, varname, drop = TRUE]		vals <- factor(data.frame(data_df)[, varname, drop = TRUE])
	nlevel <- nlevels(vals)		nlevel <- nlevels(vals)
	hash <- hash + (as.integer(vals) - 1) * base		hash <- hash + (as.integer(vals) - 1) * base
	base <- base * nlevel		base <- base * nlevel

vignettes/pseudobulk.ipynb

+1 −25

Original line number	Original line	Diff line number	Diff line
	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	This notebook implements an efficient version of pseudobulk nb-glm based differential expression analysis with DESeq2. Pseudobulk means that all reads from a single batch group (e.g. donor) get pooled into a single observation.		This notebook implements an efficient version of pseudobulk nb-glm based differential expression analysis with DESeq2. Pseudobulk means that all reads from a single batch group (e.g. donor) get pooled into a single observation.

	In general, pseudobulk is statistically preferable to but much slower than Wilcoxon, especially when you need to consider covariates. A more robust but considerably slower alternative to pseudobulk is including donors as random effects. Random effects are preferable for small cell count groups but likely give similar results to pseudobulk estimates for large groups.		In general, pseudobulk is statistically preferable to but much slower than Wilcoxon, especially when you need to consider covariates. A more robust but considerably slower alternative to pseudobulk is including donors as random effects. Random effects are preferable for small cell count groups but likely give similar results to pseudobulk estimates for large groups.

	This idea is not at all new. The earliest reference I know for is from Lun et al:		This idea is not at all new. The earliest reference I know for is from Lun et al:
	https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0947-7.		https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0947-7.


	A few implementation notes:		A few implementation notes:

	1) To find markers most upregulated in a cluster, I divide samples into those in and out of the cluster. An alternative is to let each out group remain an independent pseudobulk sample. This is in fact the recommended way from Mike Love: https://support.bioconductor.org/p/118090/. While this is certainly faster than re-estimate size factors for each cluster-specific analysis, I find it gives strange results. Namely, I get more inflated p-values and significant p-values for the wrong canonical marker genes (e.g. CD14 for B cells).		1) To find markers most upregulated in a cluster, I divide samples into those in and out of the cluster. An alternative is to let each out group remain an independent pseudobulk sample. This is in fact the recommended way from Mike Love: https://support.bioconductor.org/p/118090/. While this is certainly faster than re-estimate size factors for each cluster-specific analysis, I find it gives strange results. Namely, I get more inflated p-values and significant p-values for the wrong canonical marker genes (e.g. CD14 for B cells).

	2) On my laptop, it takes ~20 seconds to run do ~3000 genes from 2700 cells, 3 donors, 2 batches, and 9 cell types.		2) On my laptop, it takes ~20 seconds to run do ~3000 genes from 2700 cells, 3 donors, 2 batches, and 9 cell types.

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	# Load some data		# Load some data

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	suppressPackageStartupMessages({		suppressPackageStartupMessages({
	library(tidyverse)		library(tidyverse)
	library(presto)		library(presto)
	library(singlecellmethods)		library(singlecellmethods)
	library(SeuratData)		library(SeuratData)
	library(Seurat)		library(Seurat)
	library(DESeq2)		library(DESeq2)
	})		})
	```		```

	%% Output		%% Output

	Warning message:		Warning message:
	“package ‘DESeq2’ was built under R version 3.6.1”Warning message:		“package ‘DESeq2’ was built under R version 3.6.1”Warning message:
	“package ‘S4Vectors’ was built under R version 3.6.1”Warning message:		“package ‘S4Vectors’ was built under R version 3.6.1”Warning message:
	“package ‘BiocGenerics’ was built under R version 3.6.1”Warning message:		“package ‘BiocGenerics’ was built under R version 3.6.1”Warning message:
	“package ‘IRanges’ was built under R version 3.6.1”Warning message:		“package ‘IRanges’ was built under R version 3.6.1”Warning message:
	“package ‘GenomicRanges’ was built under R version 3.6.1”Warning message:		“package ‘GenomicRanges’ was built under R version 3.6.1”Warning message:
	“package ‘GenomeInfoDb’ was built under R version 3.6.1”Warning message:		“package ‘GenomeInfoDb’ was built under R version 3.6.1”Warning message:
	“package ‘SummarizedExperiment’ was built under R version 3.6.1”Warning message:		“package ‘SummarizedExperiment’ was built under R version 3.6.1”Warning message:
	“package ‘Biobase’ was built under R version 3.6.1”Warning message:		“package ‘Biobase’ was built under R version 3.6.1”Warning message:
	“package ‘DelayedArray’ was built under R version 3.6.1”Warning message:		“package ‘DelayedArray’ was built under R version 3.6.1”Warning message:
	“package ‘BiocParallel’ was built under R version 3.6.1”		“package ‘BiocParallel’ was built under R version 3.6.1”

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	Load small dataset for exposition		Load small dataset for exposition

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	if (!SeuratData::AvailableData()['pbmc3k.SeuratData', 'Installed']) {		if (!SeuratData::AvailableData()['pbmc3k.SeuratData', 'Installed']) {
	SeuratData::InstallData("pbmc3k")		SeuratData::InstallData("pbmc3k")
	}		}
	data("pbmc3k")		data("pbmc3k")
	```		```

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	Add fake donor and batch columns		Add fake donor and batch columns

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	pbmc3k@meta.data$donor <- factor(sample(LETTERS[1:3], ncol(pbmc3k), TRUE))		pbmc3k@meta.data$donor <- factor(sample(LETTERS[1:3], ncol(pbmc3k), TRUE))
	pbmc3k@meta.data$batch <- factor(sample(LETTERS[1:2], ncol(pbmc3k), TRUE))		pbmc3k@meta.data$batch <- factor(sample(LETTERS[1:2], ncol(pbmc3k), TRUE))
	```		```

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	head(pbmc3k@meta.data)		head(pbmc3k@meta.data)
	```		```

	%% Output		%% Output


	A data.frame: 6 × 6		A data.frame: 6 × 6

	\| <!--/--> \| orig.ident <fct> \| nCount_RNA <dbl> \| nFeature_RNA <int> \| seurat_annotations <fct> \| donor <fct> \| batch <fct> \|		\| <!--/--> \| orig.ident <fct> \| nCount_RNA <dbl> \| nFeature_RNA <int> \| seurat_annotations <fct> \| donor <fct> \| batch <fct> \|
	\|---\|---\|---\|---\|---\|---\|---\|		\|---\|---\|---\|---\|---\|---\|---\|
	\| AAACATACAACCAC \| pbmc3k \| 2419 \| 779 \| Memory CD4 T \| C \| A \|		\| AAACATACAACCAC \| pbmc3k \| 2419 \| 779 \| Memory CD4 T \| C \| A \|
	\| AAACATTGAGCTAC \| pbmc3k \| 4903 \| 1352 \| B \| B \| A \|		\| AAACATTGAGCTAC \| pbmc3k \| 4903 \| 1352 \| B \| B \| A \|
	\| AAACATTGATCAGC \| pbmc3k \| 3147 \| 1129 \| Memory CD4 T \| B \| A \|		\| AAACATTGATCAGC \| pbmc3k \| 3147 \| 1129 \| Memory CD4 T \| B \| A \|
	\| AAACCGTGCTTCCG \| pbmc3k \| 2639 \| 960 \| CD14+ Mono \| C \| A \|		\| AAACCGTGCTTCCG \| pbmc3k \| 2639 \| 960 \| CD14+ Mono \| C \| A \|
	\| AAACCGTGTATGCG \| pbmc3k \| 980 \| 521 \| NK \| B \| A \|		\| AAACCGTGTATGCG \| pbmc3k \| 980 \| 521 \| NK \| B \| A \|
	\| AAACGCACTGGTAC \| pbmc3k \| 2163 \| 781 \| Memory CD4 T \| C \| B \|		\| AAACGCACTGGTAC \| pbmc3k \| 2163 \| 781 \| Memory CD4 T \| C \| B \|

	A data.frame: 6 × 6		A data.frame: 6 × 6
	\begin{tabular}{r\|llllll}		\begin{tabular}{r\|llllll}
	& orig.ident & nCount\_RNA & nFeature\_RNA & seurat\_annotations & donor & batch\\		& orig.ident & nCount\_RNA & nFeature\_RNA & seurat\_annotations & donor & batch\\
	& <fct> & <dbl> & <int> & <fct> & <fct> & <fct>\\		& <fct> & <dbl> & <int> & <fct> & <fct> & <fct>\\
	\hline		\hline
	AAACATACAACCAC & pbmc3k & 2419 & 779 & Memory CD4 T & C & A\\		AAACATACAACCAC & pbmc3k & 2419 & 779 & Memory CD4 T & C & A\\
	AAACATTGAGCTAC & pbmc3k & 4903 & 1352 & B & B & A\\		AAACATTGAGCTAC & pbmc3k & 4903 & 1352 & B & B & A\\
	AAACATTGATCAGC & pbmc3k & 3147 & 1129 & Memory CD4 T & B & A\\		AAACATTGATCAGC & pbmc3k & 3147 & 1129 & Memory CD4 T & B & A\\
	AAACCGTGCTTCCG & pbmc3k & 2639 & 960 & CD14+ Mono & C & A\\		AAACCGTGCTTCCG & pbmc3k & 2639 & 960 & CD14+ Mono & C & A\\
	AAACCGTGTATGCG & pbmc3k & 980 & 521 & NK & B & A\\		AAACCGTGTATGCG & pbmc3k & 980 & 521 & NK & B & A\\
	AAACGCACTGGTAC & pbmc3k & 2163 & 781 & Memory CD4 T & C & B\\		AAACGCACTGGTAC & pbmc3k & 2163 & 781 & Memory CD4 T & C & B\\
	\end{tabular}		\end{tabular}

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	# Functions		# Functions

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	## Collapse to pseudobulk		## Collapse to pseudobulk

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	compute_hash <- function(data_df, vars_use) {		compute_hash <- function(data_df, vars_use) {
	base <- 1		base <- 1
	hash <- rep(0, nrow(data_df))		hash <- rep(0, nrow(data_df))
	for (varname in vars_use) {		for (varname in vars_use) {
	vals <- data.frame(data_df)[, varname, drop = TRUE]		vals <- factor(data.frame(data_df)[, varname, drop = TRUE])
	nlevel <- nlevels(vals)		nlevel <- nlevels(vals)
	hash <- hash + (as.integer(vals) - 1) * base		hash <- hash + (as.integer(vals) - 1) * base
	base <- base * nlevel		base <- base * nlevel
	}		}
	return(hash)		return(hash)
	}collapse_counts <- function(counts_mat, meta_data, varnames) {
	## give each unique row a hash value for indexing
	hash <- compute_hash(meta_data, varnames)
	idx_keep <- which(!is.na(hash))
	hash <- hash[idx_keep]
	hash <- factor(sprintf('sample_%d', as.integer(hash)))
	meta_data <- meta_data[idx_keep, ]
	counts_mat <- counts_mat[, idx_keep]

	## one hot encoded design matrix, sample level
	design_collapsed <- data.frame(meta_data)[, varnames] %>%
	cbind(sample_id = hash) %>%
	unique()
	row.names(design_collapsed) <- design_collapsed$sample_id

	## sum over samples
	counts_collapsed <- presto:::sumGroups(counts_mat, hash, 1) %>% t()
	row.names(counts_collapsed) <- row.names(counts_mat)
	colnames(counts_collapsed) <- levels(hash)

	## reorder to match design matrix
	counts_collapsed <- counts_collapsed[, design_collapsed$sample_id]

	return(list(counts_mat = counts_collapsed, meta_data = design_collapsed))
	}		}
	```		```

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	collapse_counts <- function(counts_mat, meta_data, varnames) {		collapse_counts <- function(counts_mat, meta_data, varnames) {
	## give each unique row a hash value for indexing		## give each unique row a hash value for indexing
	hash <- compute_hash(meta_data, varnames)		hash <- compute_hash(meta_data, varnames)
	idx_keep <- which(!is.na(hash))		idx_keep <- which(!is.na(hash))
	hash <- hash[idx_keep]		hash <- hash[idx_keep]
	hash <- factor(sprintf('sample_%d', as.integer(hash)))		hash <- factor(sprintf('sample_%d', as.integer(hash)))
	meta_data <- meta_data[idx_keep, ]		meta_data <- meta_data[idx_keep, ]
	counts_mat <- counts_mat[, idx_keep]		counts_mat <- counts_mat[, idx_keep]

	## one hot encoded design matrix, sample level		## one hot encoded design matrix, sample level
	design_collapsed <- data.frame(meta_data)[, varnames] %>%		design_collapsed <- data.frame(meta_data)[, varnames] %>%
	cbind(sample_id = hash) %>%		cbind(sample_id = hash) %>%
	unique()		unique()
	row.names(design_collapsed) <- design_collapsed$sample_id		row.names(design_collapsed) <- design_collapsed$sample_id

	## sum over samples		## sum over samples
	counts_collapsed <- presto:::sumGroups(counts_mat, hash, 1) %>% t()		counts_collapsed <- presto:::sumGroups(counts_mat, hash, 1) %>% t()
	row.names(counts_collapsed) <- row.names(counts_mat)		row.names(counts_collapsed) <- row.names(counts_mat)
	colnames(counts_collapsed) <- levels(hash)		colnames(counts_collapsed) <- levels(hash)

	## reorder to match design matrix		## reorder to match design matrix
	counts_collapsed <- counts_collapsed[, design_collapsed$sample_id]		counts_collapsed <- counts_collapsed[, design_collapsed$sample_id]

	return(list(counts_mat = counts_collapsed, meta_data = design_collapsed))		return(list(counts_mat = counts_collapsed, meta_data = design_collapsed))
	}		}
	```		```

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	## DESeq2 wrappers		## DESeq2 wrappers

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	pseudobulk_deseq2 <- function(dge_formula, meta_data, counts_df, verbose=TRUE,		pseudobulk_deseq2 <- function(dge_formula, meta_data, counts_df, verbose=TRUE,
	min_counts_per_sample=10, present_in_min_samples=5) {		min_counts_per_sample=10, present_in_min_samples=5) {
	## filter low expressed genes		## filter low expressed genes
	genes_keep <- which(rowSums(counts_df >= min_counts_per_sample) >= present_in_min_samples)		genes_keep <- which(rowSums(counts_df >= min_counts_per_sample) >= present_in_min_samples)
	if (verbose) {		if (verbose) {
	message(sprintf('Filtered out %d genes, analyzing %d genes', nrow(counts_df) - length(genes_keep), length(genes_keep)))		message(sprintf('Filtered out %d genes, analyzing %d genes', nrow(counts_df) - length(genes_keep), length(genes_keep)))
	}		}
	counts_df <- counts_df[genes_keep, ]		counts_df <- counts_df[genes_keep, ]

	## assume that the first variable in formula is the main contrast variable		## assume that the first variable in formula is the main contrast variable
	all_vars <- unlist(strsplit(tail(as.character(dge_formula), 1), split = ' \\+ '))		all_vars <- unlist(strsplit(tail(as.character(dge_formula), 1), split = ' \\+ '))
	if (verbose) {		if (verbose) {
	message(sprintf('All vars: %s', paste(all_vars, collapse = ', ')))		message(sprintf('All vars: %s', paste(all_vars, collapse = ', ')))
	}		}
	contrast_var <- head(all_vars, 1)		contrast_var <- head(all_vars, 1)
	if (verbose) {		if (verbose) {
	message(sprintf('Contrast var: %s', contrast_var))		message(sprintf('Contrast var: %s', contrast_var))
	}		}
	Reduce(rbind, lapply(unique(meta_data[[contrast_var]]), function(foreground_id) {		Reduce(rbind, lapply(unique(meta_data[[contrast_var]]), function(foreground_id) {
	if (verbose) {		if (verbose) {
	message(foreground_id)		message(foreground_id)
	}		}
	suppressMessages({suppressWarnings({		suppressMessages({suppressWarnings({
	## setup design		## setup design
	design <- meta_data		design <- meta_data
	design[[contrast_var]] <- factor(ifelse(design[[contrast_var]] == foreground_id,		design[[contrast_var]] <- factor(ifelse(design[[contrast_var]] == foreground_id,
	paste0('cluster_', foreground_id),		paste0('cluster_', foreground_id),
	'background'))		'background'))

	## background clusters should not be treated as independent observations		## background clusters should not be treated as independent observations
	res <- collapse_counts(counts_df, design, all_vars)		res <- collapse_counts(counts_df, design, all_vars)
	design <- res$meta_data		design <- res$meta_data
	counts_df <- res$counts_mat		counts_df <- res$counts_mat

	## Do DGE with DESeq2		## Do DGE with DESeq2
	dds <- DESeqDataSetFromMatrix(		dds <- DESeqDataSetFromMatrix(
	countData = counts_df,		countData = counts_df,
	colData = design,		colData = design,
	design = dge_formula) %>%		design = dge_formula) %>%
	DESeq2::DESeq()		DESeq2::DESeq()

	## Get results		## Get results
	contrast_name <- grep('cluster.*_vs_background', resultsNames(dds), value = TRUE)		contrast_name <- grep('cluster.*_vs_background', resultsNames(dds), value = TRUE)
	dge_res <- results(dds, name = contrast_name) %>%		dge_res <- results(dds, name = contrast_name) %>%
	data.frame() %>%		data.frame() %>%
	tibble::rownames_to_column('feature') %>%		tibble::rownames_to_column('feature') %>%
	dplyr::arrange(-stat) %>%		dplyr::arrange(-stat) %>%
	dplyr::mutate(group = foreground_id)		dplyr::mutate(group = foreground_id)
	})})		})})
	return(dge_res)		return(dge_res)
	})) %>%		})) %>%
	dplyr::select(group, feature, everything())		dplyr::select(group, feature, everything())

	}		}
	```		```

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	top_markers_dds <- function(res, n=10, pval_max=1, padj_max=1, lfc_min=1) {		top_markers_dds <- function(res, n=10, pval_max=1, padj_max=1, lfc_min=1) {
	res %>%		res %>%
	dplyr::filter(		dplyr::filter(
	.data$pvalue <= pval_max &		.data$pvalue <= pval_max &
	.data$padj <= padj_max &		.data$padj <= padj_max &
	log2FoldChange >= lfc_min		log2FoldChange >= lfc_min
	) %>%		) %>%
	dplyr::group_by(.data$group) %>%		dplyr::group_by(.data$group) %>%
	dplyr::top_n(n = n, wt = .data$stat) %>%		dplyr::top_n(n = n, wt = .data$stat) %>%
	dplyr::mutate(rank = rank(-.data$stat, ties.method = 'random')) %>%		dplyr::mutate(rank = rank(-.data$stat, ties.method = 'random')) %>%
	dplyr::ungroup() %>%		dplyr::ungroup() %>%
	dplyr::select(.data$feature, .data$group, .data$rank) %>%		dplyr::select(.data$feature, .data$group, .data$rank) %>%
	tidyr::spread(.data$group, .data$feature, fill = NA) %>%		tidyr::spread(.data$group, .data$feature, fill = NA) %>%
	identity()		identity()
	}		}
	```		```

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	# Test		# Test

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	## Collapse to pseudobulk		## Collapse to pseudobulk

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	data_collapsed <- collapse_counts(pbmc3k@assays$RNA@counts,		data_collapsed <- collapse_counts(pbmc3k@assays$RNA@counts,
	pbmc3k@meta.data,		pbmc3k@meta.data,
	c('seurat_annotations', 'donor', 'batch'))		c('seurat_annotations', 'donor', 'batch'))
	head(data_collapsed$meta_data)		head(data_collapsed$meta_data)
	```		```

	%% Output		%% Output


	A data.frame: 6 × 4		A data.frame: 6 × 4

	\| <!--/--> \| seurat_annotations <fct> \| donor <fct> \| batch <fct> \| sample_id <fct> \|		\| <!--/--> \| seurat_annotations <fct> \| donor <fct> \| batch <fct> \| sample_id <fct> \|
	\|---\|---\|---\|---\|---\|		\|---\|---\|---\|---\|---\|
	\| sample_19 \| Memory CD4 T \| C \| A \| sample_19 \|		\| sample_19 \| Memory CD4 T \| C \| A \| sample_19 \|
	\| sample_12 \| B \| B \| A \| sample_12 \|		\| sample_12 \| B \| B \| A \| sample_12 \|
	\| sample_10 \| Memory CD4 T \| B \| A \| sample_10 \|		\| sample_10 \| Memory CD4 T \| B \| A \| sample_10 \|
	\| sample_20 \| CD14+ Mono \| C \| A \| sample_20 \|		\| sample_20 \| CD14+ Mono \| C \| A \| sample_20 \|
	\| sample_15 \| NK \| B \| A \| sample_15 \|		\| sample_15 \| NK \| B \| A \| sample_15 \|
	\| sample_46 \| Memory CD4 T \| C \| B \| sample_46 \|		\| sample_46 \| Memory CD4 T \| C \| B \| sample_46 \|

	A data.frame: 6 × 4		A data.frame: 6 × 4
	\begin{tabular}{r\|llll}		\begin{tabular}{r\|llll}
	& seurat\_annotations & donor & batch & sample\_id\\		& seurat\_annotations & donor & batch & sample\_id\\
	& <fct> & <fct> & <fct> & <fct>\\		& <fct> & <fct> & <fct> & <fct>\\
	\hline		\hline
	sample\_19 & Memory CD4 T & C & A & sample\_19\\		sample\_19 & Memory CD4 T & C & A & sample\_19\\
	sample\_12 & B & B & A & sample\_12\\		sample\_12 & B & B & A & sample\_12\\
	sample\_10 & Memory CD4 T & B & A & sample\_10\\		sample\_10 & Memory CD4 T & B & A & sample\_10\\
	sample\_20 & CD14+ Mono & C & A & sample\_20\\		sample\_20 & CD14+ Mono & C & A & sample\_20\\
	sample\_15 & NK & B & A & sample\_15\\		sample\_15 & NK & B & A & sample\_15\\
	sample\_46 & Memory CD4 T & C & B & sample\_46\\		sample\_46 & Memory CD4 T & C & B & sample\_46\\
	\end{tabular}		\end{tabular}

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	## Do DESeq2		## Do DESeq2

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	res_mat <- pseudobulk_deseq2(~seurat_annotations + donor + batch, data_collapsed$meta_data, data_collapsed$counts_mat, verbose = TRUE)		res_mat <- pseudobulk_deseq2(~seurat_annotations + donor + batch, data_collapsed$meta_data, data_collapsed$counts_mat, verbose = TRUE)
	```		```

	%% Output		%% Output

	Filtered out 10734 genes, analyzing 2980 genes		Filtered out 10734 genes, analyzing 2980 genes
	All vars: seurat_annotations, donor, batch		All vars: seurat_annotations, donor, batch
	Contrast var: seurat_annotations		Contrast var: seurat_annotations
	Memory CD4 T		Memory CD4 T
	B		B
	CD14+ Mono		CD14+ Mono
	NK		NK
	CD8 T		CD8 T
	Naive CD4 T		Naive CD4 T
	FCGR3A+ Mono		FCGR3A+ Mono
	DC		DC
	Platelet		Platelet

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	head(res_mat)		head(res_mat)
	```		```

	%% Output		%% Output


	A data.frame: 6 × 8		A data.frame: 6 × 8

	\| group <fct> \| feature <chr> \| baseMean <dbl> \| log2FoldChange <dbl> \| lfcSE <dbl> \| stat <dbl> \| pvalue <dbl> \| padj <dbl> \|		\| group <fct> \| feature <chr> \| baseMean <dbl> \| log2FoldChange <dbl> \| lfcSE <dbl> \| stat <dbl> \| pvalue <dbl> \| padj <dbl> \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|		\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| Memory CD4 T \| LTB \| 975.8369 \| 1.4834793 \| 0.06369302 \| 23.29108 \| 5.458868e-120 \| 1.161959e-117 \|		\| Memory CD4 T \| LTB \| 975.8369 \| 1.4834793 \| 0.06369302 \| 23.29108 \| 5.458868e-120 \| 1.161959e-117 \|
	\| Memory CD4 T \| IL32 \| 515.3982 \| 1.4510555 \| 0.08988798 \| 16.14293 \| 1.273476e-58 \| 1.308607e-56 \|		\| Memory CD4 T \| IL32 \| 515.3982 \| 1.4510555 \| 0.08988798 \| 16.14293 \| 1.273476e-58 \| 1.308607e-56 \|
	\| Memory CD4 T \| CD3D \| 342.9295 \| 1.1784135 \| 0.07816887 \| 15.07523 \| 2.357007e-51 \| 2.194962e-49 \|		\| Memory CD4 T \| CD3D \| 342.9295 \| 1.1784135 \| 0.07816887 \| 15.07523 \| 2.357007e-51 \| 2.194962e-49 \|
	\| Memory CD4 T \| RPS12 \| 3477.4664 \| 0.4044355 \| 0.02744982 \| 14.73363 \| 3.920595e-49 \| 3.540416e-47 \|		\| Memory CD4 T \| RPS12 \| 3477.4664 \| 0.4044355 \| 0.02744982 \| 14.73363 \| 3.920595e-49 \| 3.540416e-47 \|
	\| Memory CD4 T \| IL7R \| 226.0807 \| 1.4613102 \| 0.10177025 \| 14.35891 \| 9.368457e-47 \| 8.211177e-45 \|		\| Memory CD4 T \| IL7R \| 226.0807 \| 1.4613102 \| 0.10177025 \| 14.35891 \| 9.368457e-47 \| 8.211177e-45 \|
	\| Memory CD4 T \| LDHB \| 435.2004 \| 1.0482839 \| 0.07401560 \| 14.16301 \| 1.551947e-45 \| 1.321372e-43 \|		\| Memory CD4 T \| LDHB \| 435.2004 \| 1.0482839 \| 0.07401560 \| 14.16301 \| 1.551947e-45 \| 1.321372e-43 \|

	A data.frame: 6 × 8		A data.frame: 6 × 8
	\begin{tabular}{r\|llllllll}		\begin{tabular}{r\|llllllll}
	group & feature & baseMean & log2FoldChange & lfcSE & stat & pvalue & padj\\		group & feature & baseMean & log2FoldChange & lfcSE & stat & pvalue & padj\\
	<fct> & <chr> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl>\\		<fct> & <chr> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl>\\
	\hline		\hline
	Memory CD4 T & LTB & 975.8369 & 1.4834793 & 0.06369302 & 23.29108 & 5.458868e-120 & 1.161959e-117\\		Memory CD4 T & LTB & 975.8369 & 1.4834793 & 0.06369302 & 23.29108 & 5.458868e-120 & 1.161959e-117\\
	Memory CD4 T & IL32 & 515.3982 & 1.4510555 & 0.08988798 & 16.14293 & 1.273476e-58 & 1.308607e-56\\		Memory CD4 T & IL32 & 515.3982 & 1.4510555 & 0.08988798 & 16.14293 & 1.273476e-58 & 1.308607e-56\\
	Memory CD4 T & CD3D & 342.9295 & 1.1784135 & 0.07816887 & 15.07523 & 2.357007e-51 & 2.194962e-49\\		Memory CD4 T & CD3D & 342.9295 & 1.1784135 & 0.07816887 & 15.07523 & 2.357007e-51 & 2.194962e-49\\
	Memory CD4 T & RPS12 & 3477.4664 & 0.4044355 & 0.02744982 & 14.73363 & 3.920595e-49 & 3.540416e-47\\		Memory CD4 T & RPS12 & 3477.4664 & 0.4044355 & 0.02744982 & 14.73363 & 3.920595e-49 & 3.540416e-47\\
	Memory CD4 T & IL7R & 226.0807 & 1.4613102 & 0.10177025 & 14.35891 & 9.368457e-47 & 8.211177e-45\\		Memory CD4 T & IL7R & 226.0807 & 1.4613102 & 0.10177025 & 14.35891 & 9.368457e-47 & 8.211177e-45\\
	Memory CD4 T & LDHB & 435.2004 & 1.0482839 & 0.07401560 & 14.16301 & 1.551947e-45 & 1.321372e-43\\		Memory CD4 T & LDHB & 435.2004 & 1.0482839 & 0.07401560 & 14.16301 & 1.551947e-45 & 1.321372e-43\\
	\end{tabular}		\end{tabular}

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	top_markers_dds(res_mat, lfc_min = 1, padj_max = .05)		top_markers_dds(res_mat, lfc_min = 1, padj_max = .05)
	```		```

	%% Output		%% Output


	A tibble: 10 × 10		A tibble: 10 × 10

	\| rank <int> \| Naive CD4 T <chr> \| Memory CD4 T <chr> \| CD14+ Mono <chr> \| B <chr> \| CD8 T <chr> \| FCGR3A+ Mono <chr> \| NK <chr> \| DC <chr> \| Platelet <chr> \|		\| rank <int> \| Naive CD4 T <chr> \| Memory CD4 T <chr> \| CD14+ Mono <chr> \| B <chr> \| CD8 T <chr> \| FCGR3A+ Mono <chr> \| NK <chr> \| DC <chr> \| Platelet <chr> \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|		\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 1 \| RPS27 \| LTB \| S100A8 \| CD79A \| CCL5 \| LST1 \| NKG7 \| HLA-DPB1 \| PF4 \|		\| 1 \| RPS27 \| LTB \| S100A8 \| CD79A \| CCL5 \| LST1 \| NKG7 \| HLA-DPB1 \| PF4 \|
	\| 2 \| RPS3A \| IL32 \| S100A9 \| HLA-DRA \| GZMK \| FCGR3A \| GZMB \| HLA-DQA1 \| PPBP \|		\| 2 \| RPS3A \| IL32 \| S100A9 \| HLA-DRA \| GZMK \| FCGR3A \| GZMB \| HLA-DQA1 \| PPBP \|
	\| 3 \| RPS25 \| CD3D \| GSTP1 \| CD74 \| GZMH \| AIF1 \| PRF1 \| CLEC10A \| CLU \|		\| 3 \| RPS25 \| CD3D \| GSTP1 \| CD74 \| GZMH \| AIF1 \| PRF1 \| CLEC10A \| CLU \|
	\| 4 \| RPL9 \| IL7R \| LGALS2 \| CD79B \| NKG7 \| FCER1G \| CTSW \| HLA-DQB1 \| GPX1 \|		\| 4 \| RPL9 \| IL7R \| LGALS2 \| CD79B \| NKG7 \| FCER1G \| CTSW \| HLA-DQB1 \| GPX1 \|
	\| 5 \| RPS27A \| LDHB \| TYROBP \| MS4A1 \| CTSW \| COTL1 \| GNLY \| HLA-DQA2 \| TPM4 \|		\| 5 \| RPS27A \| LDHB \| TYROBP \| MS4A1 \| CTSW \| COTL1 \| GNLY \| HLA-DQA2 \| TPM4 \|
	\| 6 \| RPL31 \| CD3E \| S100A6 \| TCL1A \| CD8A \| TYROBP \| SPON2 \| HLA-DRA \| MPP1 \|		\| 6 \| RPL31 \| CD3E \| S100A6 \| TCL1A \| CD8A \| TYROBP \| SPON2 \| HLA-DRA \| MPP1 \|
	\| 7 \| LDHB \| TNFRSF4 \| FCN1 \| HLA-DPB1 \| GZMA \| FTL \| FGFBP2 \| HLA-DPA1 \| CTSA \|		\| 7 \| LDHB \| TNFRSF4 \| FCN1 \| HLA-DPB1 \| GZMA \| FTL \| FGFBP2 \| HLA-DPA1 \| CTSA \|
	\| 8 \| RPS29 \| CD2 \| TYMP \| HLA-DQA1 \| CST7 \| RHOC \| CD247 \| HLA-DMA \| ODC1 \|		\| 8 \| RPS29 \| CD2 \| TYMP \| HLA-DQA1 \| CST7 \| RHOC \| CD247 \| HLA-DMA \| ODC1 \|
	\| 9 \| NPM1 \| AQP3 \| GPX1 \| HLA-DQB1 \| IL32 \| OAZ1 \| CST7 \| FCER1A \| GRAP2 \|		\| 9 \| NPM1 \| AQP3 \| GPX1 \| HLA-DQB1 \| IL32 \| OAZ1 \| CST7 \| FCER1A \| GRAP2 \|
	\| 10 \| CD3D \| TRAT1 \| LGALS1 \| HLA-DRB1 \| CD8B \| IFITM2 \| GZMA \| CD74 \| ILK \|		\| 10 \| CD3D \| TRAT1 \| LGALS1 \| HLA-DRB1 \| CD8B \| IFITM2 \| GZMA \| CD74 \| ILK \|

	A tibble: 10 × 10		A tibble: 10 × 10
	\begin{tabular}{r\|llllllllll}		\begin{tabular}{r\|llllllllll}
	rank & Naive CD4 T & Memory CD4 T & CD14+ Mono & B & CD8 T & FCGR3A+ Mono & NK & DC & Platelet\\		rank & Naive CD4 T & Memory CD4 T & CD14+ Mono & B & CD8 T & FCGR3A+ Mono & NK & DC & Platelet\\
	<int> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr>\\		<int> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr> & <chr>\\
	\hline		\hline
	1 & RPS27 & LTB & S100A8 & CD79A & CCL5 & LST1 & NKG7 & HLA-DPB1 & PF4 \\		1 & RPS27 & LTB & S100A8 & CD79A & CCL5 & LST1 & NKG7 & HLA-DPB1 & PF4 \\
	2 & RPS3A & IL32 & S100A9 & HLA-DRA & GZMK & FCGR3A & GZMB & HLA-DQA1 & PPBP \\		2 & RPS3A & IL32 & S100A9 & HLA-DRA & GZMK & FCGR3A & GZMB & HLA-DQA1 & PPBP \\
	3 & RPS25 & CD3D & GSTP1 & CD74 & GZMH & AIF1 & PRF1 & CLEC10A & CLU \\		3 & RPS25 & CD3D & GSTP1 & CD74 & GZMH & AIF1 & PRF1 & CLEC10A & CLU \\
	4 & RPL9 & IL7R & LGALS2 & CD79B & NKG7 & FCER1G & CTSW & HLA-DQB1 & GPX1 \\		4 & RPL9 & IL7R & LGALS2 & CD79B & NKG7 & FCER1G & CTSW & HLA-DQB1 & GPX1 \\
	5 & RPS27A & LDHB & TYROBP & MS4A1 & CTSW & COTL1 & GNLY & HLA-DQA2 & TPM4 \\		5 & RPS27A & LDHB & TYROBP & MS4A1 & CTSW & COTL1 & GNLY & HLA-DQA2 & TPM4 \\
	6 & RPL31 & CD3E & S100A6 & TCL1A & CD8A & TYROBP & SPON2 & HLA-DRA & MPP1 \\		6 & RPL31 & CD3E & S100A6 & TCL1A & CD8A & TYROBP & SPON2 & HLA-DRA & MPP1 \\
	7 & LDHB & TNFRSF4 & FCN1 & HLA-DPB1 & GZMA & FTL & FGFBP2 & HLA-DPA1 & CTSA \\		7 & LDHB & TNFRSF4 & FCN1 & HLA-DPB1 & GZMA & FTL & FGFBP2 & HLA-DPA1 & CTSA \\
	8 & RPS29 & CD2 & TYMP & HLA-DQA1 & CST7 & RHOC & CD247 & HLA-DMA & ODC1 \\		8 & RPS29 & CD2 & TYMP & HLA-DQA1 & CST7 & RHOC & CD247 & HLA-DMA & ODC1 \\
	9 & NPM1 & AQP3 & GPX1 & HLA-DQB1 & IL32 & OAZ1 & CST7 & FCER1A & GRAP2\\		9 & NPM1 & AQP3 & GPX1 & HLA-DQB1 & IL32 & OAZ1 & CST7 & FCER1A & GRAP2\\
	10 & CD3D & TRAT1 & LGALS1 & HLA-DRB1 & CD8B & IFITM2 & GZMA & CD74 & ILK \\		10 & CD3D & TRAT1 & LGALS1 & HLA-DRB1 & CD8B & IFITM2 & GZMA & CD74 & ILK \\
	\end{tabular}		\end{tabular}

	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	## Volcano plots		## Volcano plots

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	options(repr.plot.height = 6, repr.plot.width = 8)		options(repr.plot.height = 6, repr.plot.width = 8)
	res_mat %>%		res_mat %>%
	ggplot(aes(log2FoldChange, -log10(pvalue), color = padj < .01 & abs(log2FoldChange) > 1)) +		ggplot(aes(log2FoldChange, -log10(pvalue), color = padj < .01 & abs(log2FoldChange) > 1)) +
	geom_point(shape = 21) +		geom_point(shape = 21) +
	facet_wrap(~group, scales = 'free') +		facet_wrap(~group, scales = 'free') +
	guides(color = FALSE) +		guides(color = FALSE) +
	NULL		NULL
	```		```

	%% Output		%% Output



	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	# Comparison to Wilcoxon		# Comparison to Wilcoxon

	In this artificial example, donor and batch are fictitious, so DESeq2's GLM $\beta$ estimates should not be that different from the Wilcoxon estimates. Here, we'll compare $\beta$s to auROC, which is essentially equivalent to the Wilxocon statistic.		In this artificial example, donor and batch are fictitious, so DESeq2's GLM $\beta$ estimates should not be that different from the Wilcoxon estimates. Here, we'll compare $\beta$s to auROC, which is essentially equivalent to the Wilxocon statistic.

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	## Wilcoxon on CP10K normalized counts		## Wilcoxon on CP10K normalized counts
	exprs_norm <- singlecellmethods::normalizeData(pbmc3k@assays$RNA@counts, scaling_factor = 1e4, method = 'log')		exprs_norm <- singlecellmethods::normalizeData(pbmc3k@assays$RNA@counts, scaling_factor = 1e4, method = 'log')
	dge_wilxocon <- wilcoxauc(exprs_norm, factor(pbmc3k@meta.data$seurat_annotations))		dge_wilxocon <- wilcoxauc(exprs_norm, factor(pbmc3k@meta.data$seurat_annotations))
	```		```

	%% Output		%% Output

	Removing NA values from labels		Removing NA values from labels

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	head(dge_wilxocon)		head(dge_wilxocon)
	```		```

	%% Output		%% Output


	A data.frame: 6 × 10		A data.frame: 6 × 10

	\| feature <chr> \| group <chr> \| avgExpr <dbl> \| logFC <dbl> \| statistic <dbl> \| auc <dbl> \| pval <dbl> \| padj <dbl> \| pct_in <dbl> \| pct_out <dbl> \|		\| feature <chr> \| group <chr> \| avgExpr <dbl> \| logFC <dbl> \| statistic <dbl> \| auc <dbl> \| pval <dbl> \| padj <dbl> \| pct_in <dbl> \| pct_out <dbl> \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|		\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| AL627309.1 \| Naive CD4 T \| 0.002385006 \| -0.0041459259 \| 674621.0 \| 0.4986566 \| 0.2969381 \| 0.5033193 \| 0.1434720 \| 0.4121587 \|		\| AL627309.1 \| Naive CD4 T \| 0.002385006 \| -0.0041459259 \| 674621.0 \| 0.4986566 \| 0.2969381 \| 0.5033193 \| 0.1434720 \| 0.4121587 \|
	\| AP006222.2 \| Naive CD4 T \| 0.000000000 \| -0.0025103485 \| 675393.0 \| 0.4992272 \| 0.2993548 \| 0.5033193 \| 0.0000000 \| 0.1545595 \|		\| AP006222.2 \| Naive CD4 T \| 0.000000000 \| -0.0025103485 \| 675393.0 \| 0.4992272 \| 0.2993548 \| 0.5033193 \| 0.0000000 \| 0.1545595 \|
	\| RP11-206L10.2 \| Naive CD4 T \| 0.002616515 \| 0.0001829296 \| 676017.0 \| 0.4996884 \| 0.7459480 \| 0.8712226 \| 0.1434720 \| 0.2060793 \|		\| RP11-206L10.2 \| Naive CD4 T \| 0.002616515 \| 0.0001829296 \| 676017.0 \| 0.4996884 \| 0.7459480 \| 0.8712226 \| 0.1434720 \| 0.2060793 \|
	\| RP11-206L10.9 \| Naive CD4 T \| 0.000000000 \| -0.0018526156 \| 675393.0 \| 0.4992272 \| 0.2993548 \| 0.5033193 \| 0.0000000 \| 0.1545595 \|		\| RP11-206L10.9 \| Naive CD4 T \| 0.000000000 \| -0.0018526156 \| 675393.0 \| 0.4992272 \| 0.2993548 \| 0.5033193 \| 0.0000000 \| 0.1545595 \|
	\| LINC00115 \| Naive CD4 T \| 0.007142957 \| -0.0051776178 \| 674127.5 \| 0.4982918 \| 0.3475021 \| 0.5567626 \| 0.4304161 \| 0.7727975 \|		\| LINC00115 \| Naive CD4 T \| 0.007142957 \| -0.0051776178 \| 674127.5 \| 0.4982918 \| 0.3475021 \| 0.5567626 \| 0.4304161 \| 0.7727975 \|
	\| NOC2L \| Naive CD4 T \| 0.155593344 \| -0.0060562808 \| 668592.0 \| 0.4942001 \| 0.3741228 \| 0.5813188 \| 8.6083214 \| 9.9948480 \|		\| NOC2L \| Naive CD4 T \| 0.155593344 \| -0.0060562808 \| 668592.0 \| 0.4942001 \| 0.3741228 \| 0.5813188 \| 8.6083214 \| 9.9948480 \|

	A data.frame: 6 × 10		A data.frame: 6 × 10
	\begin{tabular}{r\|llllllllll}		\begin{tabular}{r\|llllllllll}
	feature & group & avgExpr & logFC & statistic & auc & pval & padj & pct\_in & pct\_out\\		feature & group & avgExpr & logFC & statistic & auc & pval & padj & pct\_in & pct\_out\\
	<chr> & <chr> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl>\\		<chr> & <chr> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl> & <dbl>\\
	\hline		\hline
	AL627309.1 & Naive CD4 T & 0.002385006 & -0.0041459259 & 674621.0 & 0.4986566 & 0.2969381 & 0.5033193 & 0.1434720 & 0.4121587\\		AL627309.1 & Naive CD4 T & 0.002385006 & -0.0041459259 & 674621.0 & 0.4986566 & 0.2969381 & 0.5033193 & 0.1434720 & 0.4121587\\
	AP006222.2 & Naive CD4 T & 0.000000000 & -0.0025103485 & 675393.0 & 0.4992272 & 0.2993548 & 0.5033193 & 0.0000000 & 0.1545595\\		AP006222.2 & Naive CD4 T & 0.000000000 & -0.0025103485 & 675393.0 & 0.4992272 & 0.2993548 & 0.5033193 & 0.0000000 & 0.1545595\\
	RP11-206L10.2 & Naive CD4 T & 0.002616515 & 0.0001829296 & 676017.0 & 0.4996884 & 0.7459480 & 0.8712226 & 0.1434720 & 0.2060793\\		RP11-206L10.2 & Naive CD4 T & 0.002616515 & 0.0001829296 & 676017.0 & 0.4996884 & 0.7459480 & 0.8712226 & 0.1434720 & 0.2060793\\
	RP11-206L10.9 & Naive CD4 T & 0.000000000 & -0.0018526156 & 675393.0 & 0.4992272 & 0.2993548 & 0.5033193 & 0.0000000 & 0.1545595\\		RP11-206L10.9 & Naive CD4 T & 0.000000000 & -0.0018526156 & 675393.0 & 0.4992272 & 0.2993548 & 0.5033193 & 0.0000000 & 0.1545595\\
	LINC00115 & Naive CD4 T & 0.007142957 & -0.0051776178 & 674127.5 & 0.4982918 & 0.3475021 & 0.5567626 & 0.4304161 & 0.7727975\\		LINC00115 & Naive CD4 T & 0.007142957 & -0.0051776178 & 674127.5 & 0.4982918 & 0.3475021 & 0.5567626 & 0.4304161 & 0.7727975\\
	NOC2L & Naive CD4 T & 0.155593344 & -0.0060562808 & 668592.0 & 0.4942001 & 0.3741228 & 0.5813188 & 8.6083214 & 9.9948480\\		NOC2L & Naive CD4 T & 0.155593344 & -0.0060562808 & 668592.0 & 0.4942001 & 0.3741228 & 0.5813188 & 8.6083214 & 9.9948480\\
	\end{tabular}		\end{tabular}

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	options(repr.plot.height = 6, repr.plot.width = 8)		options(repr.plot.height = 6, repr.plot.width = 8)
	dplyr::inner_join(dge_wilxocon, res_mat, by = c('feature', 'group')) %>%		dplyr::inner_join(dge_wilxocon, res_mat, by = c('feature', 'group')) %>%
	ggplot(aes(auc, stat)) +		ggplot(aes(auc, stat)) +
	geom_point(shape = '.') +		geom_point(shape = '.') +
	facet_wrap(~group, scales = 'free') +		facet_wrap(~group, scales = 'free') +
	geom_vline(xintercept = .5) +		geom_vline(xintercept = .5) +
	geom_hline(yintercept = 0) +		geom_hline(yintercept = 0) +
	labs(x = 'AUC', y = 'GLM beta') +		labs(x = 'AUC', y = 'GLM beta') +
	NULL		NULL
	```		```

	%% Output		%% Output

	Warning message:		Warning message:
	“Column `group` joining character vector and factor, coercing into character vector”		“Column `group` joining character vector and factor, coercing into character vector”



	%% Cell type:markdown id: tags:		%% Cell type:markdown id: tags:

	Most of the results agree, more or less. Interestingly, the Wilcoxon labels almost all genes as upregulated in DCs and CD16+ Monocytes and downregulated in Platelets. What's going on here? It turns out that DCs and CD16+ Monos are mRNA rich cells while platelets are mRNA poor cells overall. DESeq2 is able to account for this effect better than CP10K normalization.		Most of the results agree, more or less. Interestingly, the Wilcoxon labels almost all genes as upregulated in DCs and CD16+ Monocytes and downregulated in Platelets. What's going on here? It turns out that DCs and CD16+ Monos are mRNA rich cells while platelets are mRNA poor cells overall. DESeq2 is able to account for this effect better than CP10K normalization.

	%% Cell type:code id: tags:		%% Cell type:code id: tags:

	``` R		``` R
	options(repr.plot.height = 3, repr.plot.width = 5)		options(repr.plot.height = 3, repr.plot.width = 5)
	pbmc3k@meta.data %>%		pbmc3k@meta.data %>%
	subset(!is.na(seurat_annotations)) %>%		subset(!is.na(seurat_annotations)) %>%
	ggplot(aes(reorder(seurat_annotations, nCount_RNA), nCount_RNA)) +		ggplot(aes(reorder(seurat_annotations, nCount_RNA), nCount_RNA)) +
	geom_boxplot(outlier.shape = NA) +		geom_boxplot(outlier.shape = NA) +
	geom_jitter(shape = '.', height = 0) +		geom_jitter(shape = '.', height = 0) +
	coord_flip() +		coord_flip() +
	labs(x = '') +		labs(x = '') +
	NULL		NULL
	```		```

	%% Output		%% Output



	%% Cell type:raw id: tags:		%% Cell type:raw id: tags:

Admin message