Update reprocess_deposited_data.md (c3739133) · Commits · github_fork / SHARE Seq AlignmentV2

reprocess_deposited_data.md

+6 −1

Original line number	Diff line number	Diff line
		@@ -4,10 +4,15 @@ The SHARE-seqV2 alignment pipeline generates a pair of fastqs for each sample. T
		We have deposited SHARE-seqV2 data on [GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE207308)
		As an example, we can download the species mixing ATAC SRA record by fasterq-dump -A SRR19912835 --split-files -p
		This will generate two fastq files: SRR19912835_1.fastq and SRR19912835_2.fastq

		The read header looks like:\
		head -4 SRR19912835_1.fastq\
		@SRR19912835.1 A01389:111:H2Y5KDMXY:1:1101:1127:1000_R1.003,R2.032,R3.081,P1.06 length=50\
		GGGCTACACAGAGAAACCCTGTCTCGAAAAACAAACAAAACAAAACAAAA\
		+SRR19912835.1 A01389:111:H2Y5KDMXY:1:1101:1127:1000_R1.003,R2.032,R3.081,P1.06 length=50\
		FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:\
		FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:

		Run these lines to modify the header and convert the format to be compatable with SHARE-seq pipeline.
		cat SRR19912835_1.fastq \| awk '{if(NR%4==1) print "@"$2; else if(NR%4==2) print; else if(NR%4==3) print "+"; else if(NR%4==3) print $0}' \| bgzip > speciesmix.ATAC.R1.fastq.gz
		cat SRR19912835_2.fastq \| awk '{if(NR%4==1) print "@"$2; else if(NR%4==2) print; else if(NR%4==3) print "+"; else if(NR%4==3) print $0}' \| bgzip > speciesmix.ATAC.R2.fastq.gz