Commit a93c7c46 authored by cziegenhain's avatar cziegenhain
Browse files

BC hamming combinatorics

parent 68150255
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -31,7 +31,7 @@ We provide a script to convert zUMIs output into loom file automatically based o
zUMIs will try to automatically do this, otherwise convert zUMIs output to loom by simply running `Rscript rds2loom.R myRun.yaml`.

## Changelog
18 Sept 2020: zUMIs.2.9.4b/c/d/e: Fix & speed up Smart-seq3 UMI read counting. Prevent crash when a chunk of cell BCs does not match any downsampling. Speed up barcode detection steps for some cases. Prevent too much CPU usage in UMI error correction. Take correct samtools executable in gene annotation parsing.
18 Sept 2020 - 29 Nov Sept 2020: zUMIs.2.9.4b/c/d/e/f: Fix & speed up Smart-seq3 UMI read counting. Prevent crash when a chunk of cell BCs does not match any downsampling. Speed up barcode detection steps for some cases. Prevent too much CPU usage in UMI error correction. Take correct samtools executable in gene annotation parsing. Prevent crash in BC error correction for huge datasets.

12 Sept 2020: [zUMIs2.9.4](https://github.com/sdparekh/zUMIs/releases/tag/2.9.4): Speed writing of error-corrected UMI tags to bam file up significantly. Prevent potential crash when no cells meet any user-defined downsampling criteria.

+6 −2
Original line number Diff line number Diff line
@@ -264,8 +264,12 @@ BCbin <- function(bccount_file, bc_detected) {
  nocell_BCs <- nocell_bccount[,XC]

  if(opt$barcodes$BarcodeBinning>0){
    #break up in pieces of 2000 real BCs in case the hamming distance calculation gets too large!
    true_chunks <- split(true_BCs, ceiling(seq_along(true_BCs)/2000))
    #break up in pieces of real BCs in case the hamming distance calculation gets too large!
    combinatorics = as.numeric(length(true_BCs))*as.numeric(length(nocell_BCs))
    min_chunks = ceiling(combinatorics/(2*10^9))
    cells_per_chunk = floor(length(true_BCs)/min_chunks)
    
    true_chunks <- split(true_BCs, ceiling(seq_along(true_BCs)/cells_per_chunk))
    for(i in 1:length(true_chunks)){
      dists <- stringdist::stringdistmatrix(true_chunks[[i]],nocell_BCs,method="hamming", nthread = opt$num_threads)
      dists <- setDT(data.frame(dists))
+1 −1
Original line number Diff line number Diff line
@@ -3,7 +3,7 @@
# Pipeline to run UMI-seq analysis from fastq to read count tables.
# Authors: Swati Parekh, Christoph Ziegenhain, Beate Vieth & Ines Hellmann
# Contact: sparekh@age.mpg.de or christoph.ziegenhain@ki.se
vers=2.9.4e
vers=2.9.4f
currentv=$(curl -s https://raw.githubusercontent.com/sdparekh/zUMIs/main/zUMIs.sh | grep '^vers=' | cut -f2 -d "=")
if [ "$currentv" != "$vers" ] ; then
    echo -e "------------- \n\n Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs \n\n-------------";