Commit 3941fb43 authored by TomKellyGenetics's avatar TomKellyGenetics
Browse files

remove TSO from ICELL8 and update docs for chemistry

parent 8ac5f5c2
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -1076,6 +1076,8 @@ Mandatory arguments to long options are mandatory for short options too.
  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)

  -c,  --chemistry CHEM         Assay configuration, autodetection is not possible for converted files: SC3Pv2 (default), SC5P-PE, or SC5P-R2
                                    5′ scRNA-Seq ('SC5P-PE') is available only for 10x Genomics, ICELL8, SmartSeq, and STRT-Seq technologies
                                    All other technologies default to 3′ scRNA-Seq parameters. Only 10x Genomics and ICELL8 allow choosing which to use.
  -n,  --force-cells NUM        Force pipeline to use this number of cells, bypassing the cell detection algorithm.
  -j,  --jobmode MODE           Job manager to use. Valid options: local (default), sge, lsf, or a .template file
       --localcores NUM         Set max cores the pipeline may request at one time.
+2 −0
Original line number Diff line number Diff line
@@ -539,6 +539,8 @@ Mandatory arguments to long options are mandatory for short options too.
  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)

  -c,  --chemistry CHEM         Assay configuration, autodetection is not possible for converted files: SC3Pv2 (default), SC5P-PE, or SC5P-R2
                                    5′ scRNA-Seq ('SC5P-PE') is available only for 10x Genomics, ICELL8, SmartSeq, and STRT-Seq technologies
                                    All other technologies default to 3′ scRNA-Seq parameters. Only 10x Genomics and ICELL8 allow choosing which to use.
  -n,  --force-cells NUM        Force pipeline to use this number of cells, bypassing the cell detection algorithm.
  -j,  --jobmode MODE           Job manager to use. Valid options: local (default), sge, lsf, or a .template file
       --localcores NUM         Set max cores the pipeline may request at one time.
+2 −0
Original line number Diff line number Diff line
@@ -1076,6 +1076,8 @@ Mandatory arguments to long options are mandatory for short options too.
  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)

  -c,  --chemistry CHEM         Assay configuration, autodetection is not possible for converted files: SC3Pv2 (default), SC5P-PE, or SC5P-R2
                                    5′ scRNA-Seq ('SC5P-PE') is available only for 10x Genomics, ICELL8, SmartSeq, and STRT-Seq technologies
                                    All other technologies default to 3′ scRNA-Seq parameters. Only 10x Genomics and ICELL8 allow choosing which to use.
  -n,  --force-cells NUM        Force pipeline to use this number of cells, bypassing the cell detection algorithm.
  -j,  --jobmode MODE           Job manager to use. Valid options: local (default), sge, lsf, or a .template file
       --localcores NUM         Set max cores the pipeline may request at one time.
+71 −2
Original line number Diff line number Diff line
@@ -232,7 +232,10 @@ Mandatory arguments to long options are mandatory for short options too.

  -b,  --barcodefile FILE       Custom barcode list in plain text (with each line containing a barcode)
  
  -c,  --chemistry CHEM         Assay configuration, autodetection is not possible for converted files: 'SC3Pv2' (default), 'SC5P-PE', or 'SC5P-R2'
  -c,  --chemistry CHEM         Assay configuration, autodetection is not possible for converted files: 'SC3Pv2' (default), 'SC5P-PE', 'SC5P-R1', or 'SC5P-R2'
                                    5′ scRNA-Seq ('SC5P-PE') is available only for 10x Genomics, ICELL8, SmartSeq, and STRT-Seq technologies
                                    All other technologies default to 3′ scRNA-Seq parameters. Only 10x Genomics and ICELL8 allow choosing which to use.

  -n,  --force-cells NUM        Force pipeline to use this number of cells, bypassing the cell detection algorithm.
  -j,  --jobmode MODE           Job manager to use. Valid options: 'local' (default), 'sge', 'lsf', or a .template file
       --localcores NUM         Set max cores the pipeline may request at one time.
@@ -2637,12 +2640,78 @@ else
            fi
            
            if [[ $chemistry == "SC5P"* ]] || [[ $chemistry == "five"* ]]; then
                #remove TSO adapters (from ends)
                sed -E '
                    /^TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGG/ {
                    s/^TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGG/GGG/g
                    n
                    n
                    s/^(.{34})(.{3})/\2//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                sed -E '
                    /^AGATCGGAAGAGCGTCGTGTAGGG/ {
                    s/^AGATCGGAAGAGCGTCGTGTAGGG/GGG/g
                    n
                    n
                    s/^(.{21})(.{3})/\2//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                sed -E '
                    /^GCGTCGTGTAGGG/ {
                    s/^GCGTCGTGTAGGG/GGG/g
                    n
                    n
                    s/^(.{10})(.{3})/\2//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                sed -E '
                    /^TGTGAGAAAGGG/ {
                    s/^TGTGAGAAAGGG/GGG/g
                    n
                    n
                    s/(.{9})(.{3})/\2//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                #remove TSO adapters (from after cell barcode) #<- confirm tag sequence with Takara reps (appears to be CB-UMI-GGG)
                sed -E '
                    /^(.{11})(.{14})TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGG/ {
                    s/^(.{11})(.{14})TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGG/\1\2GGG/g
                    n
                    n
                    s/^(.{11})(.{14})(.{34})(.{3})/\1\2\4//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                sed -E '
                    /^(.{11})(.{14})AGATCGGAAGAGCGTCGTGTAGGG/ {
                    s/^(.{11})(.{14})AGATCGGAAGAGCGTCGTGTAGGG/\1\2GGG/g
                    n
                    n
                    s/^(.{11})(.{14})(.{21})(.{3})/\1\2\4//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                sed -E '
                    /^(.{11})(.{14})GCGTCGTGTAGGG/ {
                    s/^(.{11})(.{14})GCGTCGTGTAGGG/\1\2GGG/g
                    n
                    n
                    s/^(.{11})(.{14})(.{10})(.{3})/\1\2\4//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                sed -E '
                    /^(.{11})(.{14})TGTGAGAAAGGG/ {
                    s/^(.{11})(.{14})TGTGAGAAAGGG/\1\2GGG/g
                    n
                    n
                    s/(.{11})(.{14})(.{9})(.{3})/\1\2\4//g
                }'  $convFile > ${crIN}/.temp
                mv ${crIN}/.temp $convFile
                #convert TSO to expected length for 10x 5' (TSS in R1 from base 39)
                echo " handling $convFile ..."
                tsoS="TTTCTTATATGGG" #<- confirm tag sequence with Takara reps
                tsoQ="IIIIIIIIIIIII"
                #Add 10x TSO characters to the end of the sequence
                cmd=$(echo 'sed -E "2~4s/(.{'$barcodelength'})(.{'${umilength}'})(.{3})/\1\2'$tsoS'/" '$convFile' > '${crIN}'/.temp')
                cmd=$(echo 'sed -E "2~4s/(.{'$barcodelength'})(.{'${umilength}'})(.{3})/\1\2'$tsoS'/" '$convFile' > '${crIN}'/.temp') #<- confirm tag sequence (GGG) with Takara reps
                if [[ $verbose ]]; then
                    echo technology $technology
                    echo barcode: $barcodelength
+4 −1
Original line number Diff line number Diff line
@@ -257,13 +257,16 @@ Provides a conversion script to run multiple technologies and custom libraries w
  -c,  --chemistry CHEM
            Assay configuration, autodetection is not possible for converted files:

                SC3Pv2 (default), SC3Pv3, SC5P-PE, or SC5P-R2
                SC3Pv2 (default), SC3Pv3, SC5P-PE, SC5P-R1, or SC5P-R2

            Chemistry can only be automatically detected for 10x Genomics Chromium as it relies
            on matches to a barcode whitelist. For other technologies we do not recommend changing
            the chemistry input. All samples are converted to contain the barcode and UMI in Read1
            as used for SC3Pv2. SC3Pv3 is only used for technologies with longer UMI.

            5′ scRNA-Seq ('SC5P-PE') is available only for 10x Genomics, ICELL8, SmartSeq, and STRT-Seq technologies
            All other technologies default to 3′ scRNA-Seq parameters. Only 10x Genomics and ICELL8 allow choosing which to use.

  -n,  --force-cells NUM
            Force pipeline to use this number of cells, bypassing the cell detection algorithm.