Commit 186d5c7d authored by TomKellyGenetics's avatar TomKellyGenetics
Browse files

support inDrops-v3 with dedicated subroutine

parent eb77c3ad
Loading
Loading
Loading
Loading
+3 −3
Original line number Original line Diff line number Diff line
@@ -165,7 +165,7 @@ automatically but can be configured with the `--chemistry` argument.
We are developing technologies to support dual indexes and full length scRNA kits.
We are developing technologies to support dual indexes and full length scRNA kits.


Experimental technologies (not yet supported):
Experimental technologies (not yet supported):
-  inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
-  inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
-  Sci-Seq (8bp UMI, 10bp barcode): sciseq
-  Sci-Seq (8bp UMI, 10bp barcode): sciseq
-  SPLiT-Seq (10bp UMI, 18bp barcode): splitseq
-  SPLiT-Seq (10bp UMI, 18bp barcode): splitseq
-  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
-  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
@@ -980,15 +980,15 @@ Mandatory arguments to long options are mandatory for short options too.
                                  Quartz-Seq2 (14bp barcode, 8bp UMI): quartzseq2-384
                                  Quartz-Seq2 (14bp barcode, 8bp UMI): quartzseq2-384
                                  Quartz-Seq2 (15bp barcode, 8bp UMI): quartzseq2-1536
                                  Quartz-Seq2 (15bp barcode, 8bp UMI): quartzseq2-1536
                                  SCRB-Seq (6bp barcode, 10bp UMI): scrbseq, mcscrbseq
                                  SCRB-Seq (6bp barcode, 10bp UMI): scrbseq, mcscrbseq
                                  Smart-seq2-UMI, Smart-seq3 (16bp barcode, 8bp UMI): smartseq
                                  SeqWell (12bp barcode, 8bp UMI): seqwell
                                  SeqWell (12bp barcode, 8bp UMI): seqwell
                                  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
                                  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
                                Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by "_"
                                Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by "_"
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10


                                Experimental technologies (not yet supported):
                                Experimental technologies (not yet supported):
                                  inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq
                                  Smart-seq2-UMI, Smart-seq3 (11bp barcode, 8bp UMI): smartseq


  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)
  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)


+3 −3
Original line number Original line Diff line number Diff line
@@ -65,7 +65,7 @@
</ul>
</ul>
<p>All technologies support 3' single-cell RNA-Seq. Barcode adjustments and whitelists are changed automatically. For 5' single-cell RNA-Seq, this is only supported for 10x Genomics version 2 chemistry. This is detected automatically but can be configured with the <code>--chemistry</code> argument.</p>
<p>All technologies support 3' single-cell RNA-Seq. Barcode adjustments and whitelists are changed automatically. For 5' single-cell RNA-Seq, this is only supported for 10x Genomics version 2 chemistry. This is detected automatically but can be configured with the <code>--chemistry</code> argument.</p>
<p>We are developing technologies to support dual indexes and full length scRNA kits.</p>
<p>We are developing technologies to support dual indexes and full length scRNA kits.</p>
<p>Experimental technologies (not yet supported): - inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3 - Sci-Seq (8bp UMI, 10bp barcode): sciseq - SPLiT-Seq (10bp UMI, 18bp barcode): splitseq - SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad</p>
<p>Experimental technologies (not yet supported): - inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3 - Sci-Seq (8bp UMI, 10bp barcode): sciseq - SPLiT-Seq (10bp UMI, 18bp barcode): splitseq - SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad</p>
<h4 id="dual-indexing">Dual-indexing</h4>
<h4 id="dual-indexing">Dual-indexing</h4>
<p>For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use &quot;bcl2fastq&quot; before calling UniverSC:</p>
<p>For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use &quot;bcl2fastq&quot; before calling UniverSC:</p>
<pre><code>   /usr/local/bin/bcl2fastq  -v --runfolder-dir &quot;/path/to/illumina/bcls&quot;  --output-dir &quot;./Data/Intensities/BaseCalls&quot;\
<pre><code>   /usr/local/bin/bcl2fastq  -v --runfolder-dir &quot;/path/to/illumina/bcls&quot;  --output-dir &quot;./Data/Intensities/BaseCalls&quot;\
@@ -476,15 +476,15 @@ Mandatory arguments to long options are mandatory for short options too.
                                  Quartz-Seq2 (14bp barcode, 8bp UMI): quartzseq2-384
                                  Quartz-Seq2 (14bp barcode, 8bp UMI): quartzseq2-384
                                  Quartz-Seq2 (15bp barcode, 8bp UMI): quartzseq2-1536
                                  Quartz-Seq2 (15bp barcode, 8bp UMI): quartzseq2-1536
                                  SCRB-Seq (6bp barcode, 10bp UMI): scrbseq, mcscrbseq
                                  SCRB-Seq (6bp barcode, 10bp UMI): scrbseq, mcscrbseq
                                  Smart-seq2-UMI, Smart-seq3 (16bp barcode, 8bp UMI): smartseq
                                  SeqWell (12bp barcode, 8bp UMI): seqwell
                                  SeqWell (12bp barcode, 8bp UMI): seqwell
                                  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
                                  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
                                Custom inputs are also supported by giving the name &quot;custom&quot; and length of barcode and UMI separated by &quot;_&quot;
                                Custom inputs are also supported by giving the name &quot;custom&quot; and length of barcode and UMI separated by &quot;_&quot;
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10


                                Experimental technologies (not yet supported):
                                Experimental technologies (not yet supported):
                                  inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq
                                  Smart-seq2-UMI, Smart-seq3 (11bp barcode, 8bp UMI): smartseq


  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)
  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)


+3 −3
Original line number Original line Diff line number Diff line
@@ -165,7 +165,7 @@ automatically but can be configured with the `--chemistry` argument.
We are developing technologies to support dual indexes and full length scRNA kits.
We are developing technologies to support dual indexes and full length scRNA kits.


Experimental technologies (not yet supported):
Experimental technologies (not yet supported):
-  inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
-  inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
-  Sci-Seq (8bp UMI, 10bp barcode): sciseq
-  Sci-Seq (8bp UMI, 10bp barcode): sciseq
-  SPLiT-Seq (10bp UMI, 18bp barcode): splitseq
-  SPLiT-Seq (10bp UMI, 18bp barcode): splitseq
-  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
-  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
@@ -980,15 +980,15 @@ Mandatory arguments to long options are mandatory for short options too.
                                  Quartz-Seq2 (14bp barcode, 8bp UMI): quartzseq2-384
                                  Quartz-Seq2 (14bp barcode, 8bp UMI): quartzseq2-384
                                  Quartz-Seq2 (15bp barcode, 8bp UMI): quartzseq2-1536
                                  Quartz-Seq2 (15bp barcode, 8bp UMI): quartzseq2-1536
                                  SCRB-Seq (6bp barcode, 10bp UMI): scrbseq, mcscrbseq
                                  SCRB-Seq (6bp barcode, 10bp UMI): scrbseq, mcscrbseq
                                  Smart-seq2-UMI, Smart-seq3 (16bp barcode, 8bp UMI): smartseq
                                  SeqWell (12bp barcode, 8bp UMI): seqwell
                                  SeqWell (12bp barcode, 8bp UMI): seqwell
                                  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
                                  SureCell (18bp barcode, 8bp UMI): surecell, ddseq, biorad
                                Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by "_"
                                Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by "_"
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10


                                Experimental technologies (not yet supported):
                                Experimental technologies (not yet supported):
                                  inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq
                                  Smart-seq2-UMI, Smart-seq3 (11bp barcode, 8bp UMI): smartseq


  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)
  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)


+38 −9
Original line number Original line Diff line number Diff line
@@ -216,7 +216,7 @@ Mandatory arguments to long options are mandatory for short options too.
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10


                                Experimental technologies (not yet supported):
                                Experimental technologies (not yet supported):
                                  inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq                                  
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq                                  


  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)
  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)
@@ -705,9 +705,9 @@ elif [[ "$technology" == "indrop-v1" ]] || [[ "$technology" == "indrop-v2" ]]; t
    umilength=6
    umilength=6
    minlength=16
    minlength=16
elif [[ "$technology" == "indrop-v3" ]]; then
elif [[ "$technology" == "indrop-v3" ]]; then
    barcodelength=11
    barcodelength=16
    umilength=6
    umilength=6
    minlength=8
    minlength=16
elif [[ "$technology" == "marsseq-v1" ]]; then
elif [[ "$technology" == "marsseq-v1" ]]; then
    barcodelength=6
    barcodelength=6
    umilength=10
    umilength=10
@@ -1523,11 +1523,10 @@ else
        fi
        fi
        if [[ "$technology" == "indrop-v"* ]]; then
        if [[ "$technology" == "indrop-v"* ]]; then
            if [[ "$technology" == "indrop-v1" ]] || [[ $technology"" == "indrop-v2" ]]; then
            if [[ "$technology" == "indrop-v1" ]] || [[ $technology"" == "indrop-v2" ]]; then
                ${MAKEINDROPBARCODES} ${whitelistdir}/inDrop_gel_barcode1_list.txt ${whitelistdir}/inDrop_gel_barcode2_list.txt v2 ${whitelistdir}
                 perl ${MAKEINDROPBARCODES} ${whitelistdir}/inDrop_gel_barcode1_list.txt ${whitelistdir}/inDrop_gel_barcode2_list.txt v2 ${whitelistdir}
            elif [[ "$technology" == "indrop-v3" ]]; then
            elif [[ "$technology" == "indrop-v3" ]]; then
                #ignore barcodes in index (R1 only)
                #allow for barcodes in index (I1) and R1
                cp ${whitelistdir}/inDrop_gel_barcode2_list.txt ${whitelistdir}/inDrop-v3_barcodes.txt
                ${MAKEINDROPBARCODES} ${whitelistdir}/inDrop_gel_barcode1_list.txt ${whitelistdir}/inDrop_gel_barcode2_list.txt v3 ${whitelistdir}
                #${MAKEINDROPBARCODES} ${whitelistdir}/inDrop_gel_barcode1_list.txt ${whitelistdir}/inDrop_gel_barcode2_list.txt v3 ${whitelistdir}
            fi
            fi
        else
        else
            #generating permutations of ATCG of barcode length (non-standard evaluation required to run in script)
            #generating permutations of ATCG of barcode length (non-standard evaluation required to run in script)
@@ -2175,6 +2174,36 @@ else
            sed -E '2~2s/^(.{8})(.{8}).{4}(.{6})/\1\2\3/g' > ${crIN}/.temp
            sed -E '2~2s/^(.{8})(.{8}).{4}(.{6})/\1\2\3/g' > ${crIN}/.temp
            mv ${crIN}/.temp $convFile
            mv ${crIN}/.temp $convFile
        done
        done
    # inDrops: migrate dual indexes to barcode
    # https://github.com/BUStools/bustools/issues/4
    if [[ "$technology" == "indrop-v3" ]]; then
        echo "  ...processsing for ${technology}"
         if [[ $verbose ]]; then
             echo "Note: inDrops v2 should be demultiplex by sample index I2 (R3 or 4) if multiple samples are sequenced"
        fi
        for convFile in "${convFiles[@]}"; do
            read=$convFile
            convR1=$read
            convR2=$(echo $read | perl -pne 's/(.*)_R1/$1_R2/' )
            convI1=$(echo $read | perl -pne 's/(.*)_R1/$1_I1/' )
            convI2=$(echo $read | perl -pne 's/(.*)_R1/$1_I2/' )

            # (R1 -> R2; R2 -> I1; R3 -> I2; R4 -> R1)
            # v3 : summer 2016 redesign requiring manual demultiplexing.
            # R1 is the biological read (R2).
            # R2 (i7) carries the first half of the gel barcode (I1). <- which cell
            # R3 (i5) carries the library index (I2). <- which sample/organism
            # R4 the second half of the gel barcode, the UMI and a fraction of the polyA tail (R1).
            # returns R1 with tag sequence removed (left trim) starting with 8pbp UMI and corresponding reads for I1, I2, and R2

            echo "  ...concatencate barcodes to R1 from I1 index file"
             # concatenate barcocdes from dual indexes to R1 as (bases 1-8 of the) barcode (bases 1-16), moving UMI to (17-22)
            # filter UMI reads by matching tag sequence ATTGCGCAATG (bases 1-11 of R1) and remove as an adapters
            perl sub/ConcatenateDualIndexBarcodes.pl --additive=${convI1} --ref_fastq=${convR1} --out_dir $crIN

            #returns a combined R1 file with I1-I2-R1 concatenated (I1 and I2 are R1 barcode)
            mv $crIN/Concatenated_File.fastq ${convR1}
        done
    fi
    fi
    
    
    #QuartzSeq: remove adapter
    #QuartzSeq: remove adapter
@@ -2305,7 +2334,7 @@ else
            mv $crIN/SmartSeq3_parsed_I2.fastq ${convI2}
            mv $crIN/SmartSeq3_parsed_I2.fastq ${convI2}


            echo "  ...concatencate barcodes to R1 from I1 and I2 index files"
            echo "  ...concatencate barcodes to R1 from I1 and I2 index files"
            # filter UMI reads by matching tag sequence ATTGCGCAATG (bases 1-11 of R1) and remove as an adapters 
            # concatenate barcocdes from dual indexes to R1 as barcode (bases 1-16)
            perl sub/ConcatenateDualIndexBarcodes.pl --additive=${convI1} --additive=${convI2} --ref_fastq=${convR1} --out_dir $crIN
            perl sub/ConcatenateDualIndexBarcodes.pl --additive=${convI1} --additive=${convI2} --ref_fastq=${convR1} --out_dir $crIN


            #returns a combined R1 file with I1-I2-R1 concatenated (I1 and I2 are R1 barcode)
            #returns a combined R1 file with I1-I2-R1 concatenated (I1 and I2 are R1 barcode)
+1 −1
Original line number Original line Diff line number Diff line
@@ -206,7 +206,7 @@ Provides a conversion script to run multiple technologies and custom libraries w
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10
                                  e.g. Custom (16bp barcode, 10bp UMI): custom_16_10


                                Experimental technologies (not yet supported):
                                Experimental technologies (not yet supported):
                                  inDrops version 3 (8bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  inDrops version 3 (16bp barcode, 6bp UMI): indrops-v3, 1cellbio-v3
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq                                  
                                  Sci-Seq (8bp UMI, 10bp barcode): sciseq                                  


           A barcode whitelist is provided for all beads or wells for the following technologies:
           A barcode whitelist is provided for all beads or wells for the following technologies: