Commit e113425d authored by Tamara Hodgetts's avatar Tamara Hodgetts
Browse files

Merge branch 'dev' into module_version_2

parents 8a830c0b a0cf7ead
Loading
Loading
Loading
Loading
+0 −7
Original line number Diff line number Diff line
@@ -13,12 +13,5 @@ testing*

R_sessionInfo.log

dev/docker/static_reports/test_data/*
dev/docker/static_reports/test_output/*
!dev/docker/static_reports/test_data/_dummy.txt
!dev/docker/static_reports/test_output/_dummy.txt

!tests/data/

.vscode/
profile.svg
+24 −5
Original line number Diff line number Diff line
@@ -3,14 +3,33 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.0.0 - [date]
## [1.0.0] - 2021-11-05

Initial release of nf-core/cutandrun, created with the [nf-core](https://nf-co.re/) template.

### `Added`
This pipeline is a best-practice bioinformatic analysis pipeline for CUT&Run and CUT&Tag experimental protocols that where developed to study protein-DNA interactions and epigenomic profiling.

### `Fixed`
nf-core/cutandrun was originally written by Chris Cheshire ([@chris-cheshire](https://github.com/chris-cheshire)) and Charlotte West ([@charlotte-west](https://github.com/charlotte-west)) from [Luscombe Lab](https://www.crick.ac.uk/research/labs/nicholas-luscombe) at [The Francis Crick Institute](https://www.crick.ac.uk/), London, UK.

### `Dependencies`
The pipeline structure and parts of the downstream analysis were adapted from the original CUT&Tag analysis [protocol](https://yezhengstat.github.io/CUTTag_tutorial/) from the [Henikoff Lab](https://research.fredhutch.org/henikoff/en.html).

### `Deprecated`
We thank Harshil Patel ([@drpatelh](https://github.com/drpatelh)) and everyone in the Luscombe Lab ([@luslab](https://github.com/luslab)) for their extensive assistance in the development of this pipeline.

### Pipeline summary

1. Check input files
2. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
4. Adapter and quality trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
5. Alignment to both target and spike-in genomes ([`Bowtie 2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml))
6. Filter on quality, sort and index alignments ([`samtools`](https://sourceforge.net/projects/samtools/files/samtools/))
7. Duplicate read marking ([`picard`](https://broadinstitute.github.io/picard/))
8. Create bedGraph files ([`bedtools`](https://github.com/arq5x/bedtools2/)
9. Create bigWig coverage files ([`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
10. Peak calling specifically tailored for low background noise experiments ([`SEACR`](https://github.com/FredHutch/SEACR))
11. Consensus peak merging and reporting ([`bedtools`](https://github.com/arq5x/bedtools2/))
12. Quality control and analysis:
    1. Alignment, fragment length and peak analysis and replicate reproducibility ([`python`](https://www.python.org/))
    2. Heatmap peak analysis ([`deepTools`](https://github.com/deeptools/deepTools/))
13. Genome browser session ([`IGV`](https://software.broadinstitute.org/software/igv/))
14. Present QC for raw read, alignment and duplicate reads ([`MultiQC`](http://multiqc.info/))
+4 −3
Original line number Diff line number Diff line
@@ -3,7 +3,7 @@
[![GitHub Actions CI Status](https://github.com/nf-core/cutandrun/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/cutandrun/actions?query=workflow%3A%22nf-core+CI%22)
[![GitHub Actions Linting Status](https://github.com/nf-core/cutandrun/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/cutandrun/actions?query=workflow%3A%22nf-core+linting%22)
[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/cutandrun/results)
[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
[![DOI](http://img.shields.io/badge/DOI-10.5281/zenodo.5653536-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.5653536)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.04.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
@@ -82,6 +82,8 @@ The pipeline structure and parts of the downstream analysis were adapted from th

We thank Harshil Patel ([@drpatelh](https://github.com/drpatelh)) and everyone in the Luscombe Lab ([@luslab](https://github.com/luslab)) for their extensive assistance in the development of this pipeline.

![[The Francis Crick Institute](https://www.crick.ac.uk/)](docs/images/crick_logo.png)

## Contributions and Support

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
@@ -90,8 +92,7 @@ For further information or help, don't hesitate to get in touch on the [Slack `#

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use  nf-core/cutandrun for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
If you use  nf-core/cutandrun for your analysis, please cite it using the following doi: [10.5281/zenodo.5653536](https://doi.org/10.5281/zenodo.5653536)

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

+0 −14
Original line number Diff line number Diff line
@@ -57,21 +57,7 @@ section_comments:
        - It can be due to the Tn5 preference.
        - What you might be detecting is the 10-bp periodicity that shows up as a sawtooth pattern in the length distribution. If so, this is normal and will not affect alignment or peak calling. In any case we do not recommend trimming as the bowtie2 parameters that we list will give accurate mapping information without trimming.

# fastqc_config:
#     fastqc_theoretical_gc: "hg38_genome"

# extra_fn_clean_exts:
#   - ".spikein.bowtie2"
#   - "_processedFile"
# extra_fn_clean_trim:
#   - "#"
#   - ".myext"

# Customise the module search patterns to speed up execution time
#  - Skip module sub-tools that we are not interested in
#  - Replace file-content searching with filename pattern searching
#  - Don't add anything that is the same as the MultiQC default
# See https://multiqc.info/docs/#optimise-file-search-patterns for details
sp:
    cutadapt:
        fn: '*trimming_report.txt'
+4 −2
Original line number Diff line number Diff line
@@ -35,7 +35,8 @@ peak_perc = 0
print('Reading file')

# Read file in using dask
ddf_inter = dd.read_csv(args.intersect, sep='\t', header=None, names=['chrom','start','end','overlap_1','key','a_name','b_name','count'])
ddf_inter = dd.read_csv(args.intersect, sep='\t', header=None, names=['chrom','start','end','overlap_1','key','a_name','b_name','count'],
    dtype={'chrom':str,'start':np.int64,'end':np.int64,'overlap_1':np.float64,'key':np.float64,'a_name':str,'b_name':str,'count':np.int64})

# Find number of files
numfiles = ddf_inter['b_name'].max().compute()
@@ -44,7 +45,8 @@ numfiles = ddf_inter['b_name'].max().compute()
if isinstance(numfiles, str):
    print('Detected single file, reloading table')
    numfiles = 1
    ddf_inter = dd.read_csv(args.intersect, sep='\t', header=None, names=['chrom','start','end','overlap_1','overlap_2','key','name','count'])
    ddf_inter = dd.read_csv(args.intersect, sep='\t', header=None, names=['chrom','start','end','overlap_1','overlap_2','key','name','count'],
        dtype={'chrom':str,'start':np.int64,'end':np.int64,'overlap_1':np.float64,'overlap_2':np.float64,'key':str,'name':str,'count':np.int64})

print('Number of files: ' + str(numfiles))

Loading