Commit 8ac5f5c2 authored by TomKellyGenetics's avatar TomKellyGenetics
Browse files

update documentation to clarify non-UMI and dual-indexed technologies

parent 40bda015
Loading
Loading
Loading
Loading
+22 −1
Original line number Original line Diff line number Diff line
@@ -212,12 +212,24 @@ configured with the `--chemistry` argument.
For other technologies, the template switching oligonucleotide
For other technologies, the template switching oligonucleotide
is automatically converted to the match the 10x sequence. 
is automatically converted to the match the 10x sequence. 


By default, UMIs are supported where available so with the following
exceptions for non-UMI technologies:
ICELL8 v2, RamDA-Seq, Quartz-Seq, Smart-Seq, Smart-Seq2.
Other techniques can be forced to replace the UMI with a mock sequence
for counting reads only with `--non-umi` or `--read-only` arguments.
Forcing non-UMI techniques is _not recommended_ unless you are 
integrating non-UMI and UMI-based technologies. It is not necessary
to specific `--non-umi` for non-UMI techniques as these will be used
automatically when applicable. For ICELL8 and Smart-Seq where both
non-UMI (icell8-v2, smartseq2) and UMI-based (icell8-v3, smartseq3)
techniques are available it is possible to specify which to use.

Single indexes are supported for STRT-Seq, Quartz-Seq, and RamDA-Seq.
Single indexes are supported for STRT-Seq, Quartz-Seq, and RamDA-Seq.
Dual indexes are supported for inDrops-v3, SCI-RNA-Seq, scifi-seq, and Smart-Seq.
Dual indexes are supported for inDrops-v3, SCI-RNA-Seq, scifi-seq, and Smart-Seq.
Combinatorial indexing technologies have linkers between barcodes removed
Combinatorial indexing technologies have linkers between barcodes removed
automatically to match the barcode whitelist.
automatically to match the barcode whitelist.


#### Dual-indexing
#### Demultiplexing for dual-indexing


For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use "bcl2fastq"
For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use "bcl2fastq"
before calling UniverSC:
before calling UniverSC:
@@ -229,6 +241,15 @@ before calling UniverSC:
                                --minimum-trimmed-read-length 0
                                --minimum-trimmed-read-length 0
```
```


Please adjust the lengths for `--use-bases-mask` accordingly for read 1, index 1 (i7), index 2 (i5), and read 2.
Ensure that `--create-fastq-for-index-read` is used where possible. If a sequencing facility has demultiplexed
the samples for you without this, UniverSC will attempt to extract index sequences from FASTQ headers in read 1.
Using `--no-lane-splitting` is optional as UniverSC can process an arbirtary number of lanes.

There is no need to specify index sequences in the same sheet for cell barcodes, using "NNNNNNNN" will match all 
samples and the cell barcodes will be distinguished by the single-cell processing pipeline. Index sequences should
only be used to demultiplex samples and replicates (not cells).

#### Custom inputs
#### Custom inputs


Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by a "_" character.
Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by a "_" character.
+4 −1
Original line number Original line Diff line number Diff line
@@ -99,13 +99,16 @@
<li>SureCell (18 bp barcode, 8 bp UMI): surecell, ddseq, biorad</li>
<li>SureCell (18 bp barcode, 8 bp UMI): surecell, ddseq, biorad</li>
</ul>
</ul>
<p>All technologies support 3' single-cell RNA-Seq. Barcode adjustments and whitelists are changed automatically. For 5' single-cell RNA-Seq, this is only supported for 10x Genomics version 2 chemistry, ICELL8, Smart-Seq, and STRT-Seq. For 10x Genomics, this is detected automatically but can be configured with the <code>--chemistry</code> argument. For other technologies, the template switching oligonucleotide is automatically converted to the match the 10x sequence.</p>
<p>All technologies support 3' single-cell RNA-Seq. Barcode adjustments and whitelists are changed automatically. For 5' single-cell RNA-Seq, this is only supported for 10x Genomics version 2 chemistry, ICELL8, Smart-Seq, and STRT-Seq. For 10x Genomics, this is detected automatically but can be configured with the <code>--chemistry</code> argument. For other technologies, the template switching oligonucleotide is automatically converted to the match the 10x sequence.</p>
<p>By default, UMIs are supported where available so with the following exceptions for non-UMI technologies: ICELL8 v2, RamDA-Seq, Quartz-Seq, Smart-Seq, Smart-Seq2. Other techniques can be forced to replace the UMI with a mock sequence for counting reads only with <code>--non-umi</code> or <code>--read-only</code> arguments. Forcing non-UMI techniques is <em>not recommended</em> unless you are integrating non-UMI and UMI-based technologies. It is not necessary to specific <code>--non-umi</code> for non-UMI techniques as these will be used automatically when applicable. For ICELL8 and Smart-Seq where both non-UMI (icell8-v2, smartseq2) and UMI-based (icell8-v3, smartseq3) techniques are available it is possible to specify which to use.</p>
<p>Single indexes are supported for STRT-Seq, Quartz-Seq, and RamDA-Seq. Dual indexes are supported for inDrops-v3, SCI-RNA-Seq, scifi-seq, and Smart-Seq. Combinatorial indexing technologies have linkers between barcodes removed automatically to match the barcode whitelist.</p>
<p>Single indexes are supported for STRT-Seq, Quartz-Seq, and RamDA-Seq. Dual indexes are supported for inDrops-v3, SCI-RNA-Seq, scifi-seq, and Smart-Seq. Combinatorial indexing technologies have linkers between barcodes removed automatically to match the barcode whitelist.</p>
<h4 id="dual-indexing">Dual-indexing</h4>
<h4 id="demultiplexing-for-dual-indexing">Demultiplexing for dual-indexing</h4>
<p>For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use &quot;bcl2fastq&quot; before calling UniverSC:</p>
<p>For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use &quot;bcl2fastq&quot; before calling UniverSC:</p>
<pre><code>   /usr/local/bin/bcl2fastq  -v --runfolder-dir &quot;/path/to/illumina/bcls&quot;  --output-dir &quot;./Data/Intensities/BaseCalls&quot;\
<pre><code>   /usr/local/bin/bcl2fastq  -v --runfolder-dir &quot;/path/to/illumina/bcls&quot;  --output-dir &quot;./Data/Intensities/BaseCalls&quot;\
                                --sample-sheet &quot;/path/to/SampleSheet.csv&quot; --create-fastq-for-index-reads\
                                --sample-sheet &quot;/path/to/SampleSheet.csv&quot; --create-fastq-for-index-reads\
                                --use-bases-mask Y26n,I8n,I8n,Y50n  --mask-short-adapter-reads 0\
                                --use-bases-mask Y26n,I8n,I8n,Y50n  --mask-short-adapter-reads 0\
                                --minimum-trimmed-read-length 0</code></pre>
                                --minimum-trimmed-read-length 0</code></pre>
<p>Please adjust the lengths for <code>--use-bases-mask</code> accordingly for read 1, index 1 (i7), index 2 (i5), and read 2. Ensure that <code>--create-fastq-for-index-read</code> is used where possible. If a sequencing facility has demultiplexed the samples for you without this, UniverSC will attempt to extract index sequences from FASTQ headers in read 1. Using <code>--no-lane-splitting</code> is optional as UniverSC can process an arbirtary number of lanes.</p>
<p>There is no need to specify index sequences in the same sheet for cell barcodes, using &quot;NNNNNNNN&quot; will match all samples and the cell barcodes will be distinguished by the single-cell processing pipeline. Index sequences should only be used to demultiplex samples and replicates (not cells).</p>
<h4 id="custom-inputs">Custom inputs</h4>
<h4 id="custom-inputs">Custom inputs</h4>
<p>Custom inputs are also supported by giving the name &quot;custom&quot; and length of barcode and UMI separated by a &quot;_&quot; character.</p>
<p>Custom inputs are also supported by giving the name &quot;custom&quot; and length of barcode and UMI separated by a &quot;_&quot; character.</p>
<p>e.g. Custom (16bp barcode, 10bp UMI): <code>custom_16_10</code></p>
<p>e.g. Custom (16bp barcode, 10bp UMI): <code>custom_16_10</code></p>
+23 −2
Original line number Original line Diff line number Diff line
@@ -6,7 +6,7 @@ affiliations:
   index: 1
   index: 1
 - name: "RIKEN Center for Sustainable Resource Sciences, Suehiro-cho-1-7-22, Tsurumi Ward, Yokohama, Kanagawa 230-0045, Japan"
 - name: "RIKEN Center for Sustainable Resource Sciences, Suehiro-cho-1-7-22, Tsurumi Ward, Yokohama, Kanagawa 230-0045, Japan"
   index: 2
   index: 2
date: "Tuesday 27 April 2021"
date: "Wednesday 28 April 2021"
output:
output:
  prettydoc::html_pretty:
  prettydoc::html_pretty:
       theme: cayman
       theme: cayman
@@ -212,12 +212,24 @@ configured with the `--chemistry` argument.
For other technologies, the template switching oligonucleotide
For other technologies, the template switching oligonucleotide
is automatically converted to the match the 10x sequence. 
is automatically converted to the match the 10x sequence. 


By default, UMIs are supported where available so with the following
exceptions for non-UMI technologies:
ICELL8 v2, RamDA-Seq, Quartz-Seq, Smart-Seq, Smart-Seq2.
Other techniques can be forced to replace the UMI with a mock sequence
for counting reads only with `--non-umi` or `--read-only` arguments.
Forcing non-UMI techniques is _not recommended_ unless you are 
integrating non-UMI and UMI-based technologies. It is not necessary
to specific `--non-umi` for non-UMI techniques as these will be used
automatically when applicable. For ICELL8 and Smart-Seq where both
non-UMI (icell8-v2, smartseq2) and UMI-based (icell8-v3, smartseq3)
techniques are available it is possible to specify which to use.

Single indexes are supported for STRT-Seq, Quartz-Seq, and RamDA-Seq.
Single indexes are supported for STRT-Seq, Quartz-Seq, and RamDA-Seq.
Dual indexes are supported for inDrops-v3, SCI-RNA-Seq, scifi-seq, and Smart-Seq.
Dual indexes are supported for inDrops-v3, SCI-RNA-Seq, scifi-seq, and Smart-Seq.
Combinatorial indexing technologies have linkers between barcodes removed
Combinatorial indexing technologies have linkers between barcodes removed
automatically to match the barcode whitelist.
automatically to match the barcode whitelist.


#### Dual-indexing
#### Demultiplexing for dual-indexing


For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use "bcl2fastq"
For dual-indexed technologies such as inDrops-v3, Sci-Seq, SmartSeq3 it is advised to use "bcl2fastq"
before calling UniverSC:
before calling UniverSC:
@@ -229,6 +241,15 @@ before calling UniverSC:
                                --minimum-trimmed-read-length 0
                                --minimum-trimmed-read-length 0
```
```


Please adjust the lengths for `--use-bases-mask` accordingly for read 1, index 1 (i7), index 2 (i5), and read 2.
Ensure that `--create-fastq-for-index-read` is used where possible. If a sequencing facility has demultiplexed
the samples for you without this, UniverSC will attempt to extract index sequences from FASTQ headers in read 1.
Using `--no-lane-splitting` is optional as UniverSC can process an arbirtary number of lanes.

There is no need to specify index sequences in the same sheet for cell barcodes, using "NNNNNNNN" will match all 
samples and the cell barcodes will be distinguished by the single-cell processing pipeline. Index sequences should
only be used to demultiplex samples and replicates (not cells).

#### Custom inputs
#### Custom inputs


Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by a "_" character.
Custom inputs are also supported by giving the name "custom" and length of barcode and UMI separated by a "_" character.
+25 −2
Original line number Original line Diff line number Diff line
@@ -95,6 +95,11 @@ Provides a conversion script to run multiple technologies and custom libraries w
                /usr/local/bin/bcl2fastq -v --runfolder-dir "/path/to/illumina/bcls"  --output-dir "./Data/Intensities/BaseCalls"\
                /usr/local/bin/bcl2fastq -v --runfolder-dir "/path/to/illumina/bcls"  --output-dir "./Data/Intensities/BaseCalls"\
                                            --sample-sheet "/path/to/SampleSheet.csv" --create-fastq-for-index-reads
                                            --sample-sheet "/path/to/SampleSheet.csv" --create-fastq-for-index-reads


            Index 1 file is required for the following technologies, in addition to those requiring Index 2.
            UniverSC will attempt to extract them from Read 1 headers if not found:

                   inDrops-v3, STRT-Seq-C1

  -I2, --index2 FILE
  -I2, --index2 FILE
            Index (I2) FASTQ file to pass to Cell Ranger (OPTIONAL). Contains the indexes 
            Index (I2) FASTQ file to pass to Cell Ranger (OPTIONAL). Contains the indexes 
            for each sample. (In the case of Illumina paired-ends these are the i5 indexes).
            for each sample. (In the case of Illumina paired-ends these are the i5 indexes).
@@ -149,6 +154,11 @@ Provides a conversion script to run multiple technologies and custom libraries w
            Note: processing dual-indexed files is not stable. If behaviour is not as you expect,
            Note: processing dual-indexed files is not stable. If behaviour is not as you expect,
            we welcome you to contact us on GitHub to help you out.
            we welcome you to contact us on GitHub to help you out.


            Index 1 and Index 2 files are required for the following technologies
            UniverSC will attempt to extract them from Read 1 headers if not found:

                   SCI-RNA-Seq, SCI-RNA-Seq3, scifi-seq, Smart-Seq2, Smart-Seq3, STRT-Seq-2i

  -f,  --file NAME
  -f,  --file NAME
            Path and the name of FASTQ files to pass to Cell Ranger (prefix before R1 or R2)
            Path and the name of FASTQ files to pass to Cell Ranger (prefix before R1 or R2)


@@ -227,6 +237,19 @@ Provides a conversion script to run multiple technologies and custom libraries w
            Where no known barcodes are available all possible barcodes of the expected length are
            Where no known barcodes are available all possible barcodes of the expected length are
            generated and converted if the permutations have not been computed already.
            generated and converted if the permutations have not been computed already.


            Linkers are automatically removed from the following technologies:

                  BD Rhapsody, inDrops-v1, Microwell-Seq, SCI-Seq3 Split-Seq, Smart-Seq2, Smart-Seq3, SureCell

            The following technologies default to non-UMI parameters (others can be forced):

                  ICELL8-v2, RamDA-Seq, Quartz-Seq, Smart-Seq, Smart-Seq2

            The following technologies require Index 1 or Index 2 sequences (see above):

                  inDrops-v3,  SCI-RNA-Seq, SCI-RNA-Seq3, scifi-seq, Smart-Seq2, Smart-Seq3, STRT-Seq-2i, STRT-Seq-C1


  -b,  --barcodefile FILE
  -b,  --barcodefile FILE
            Custom barcode list in plain text (with each line containing a barcode). Please provide
            Custom barcode list in plain text (with each line containing a barcode). Please provide
            the name of a text file in the working directory or the path to it.
            the name of a text file in the working directory or the path to it.