-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hi! Thanks for making this amazing tool
I noticed
- ultraplex does not remove barcode from the other read (when insert is short)
- ultraplex removed the space between read id and read name, so when I want to further remove the barcode at the other read, cutadapt cannot validate the integrity between two reads (read id is concatenated to name, causing problems)
In particular:
problem 1:
for a library with barcode ATGCGCAG, the output of the reverse read still contain the reverse complement of the barcode at the end (CTGCGCAT is reverse complement to ATGCGCAG)
zcat ultraple_demux_somthing_Rev.fastq.gz | grep -v '@' | grep CTGCGCAT
GACGTCGTGCTCTCCCCCTGCGCAT
CCCCCCGCGGGGGCGCGCCGGTTCTGCGCAT
CTCCCGGGGCTACGCCTGTCTGAGCGTCGCTATCTGCGCAT
GAAAGTCGGAGACCTGCGCAT
TCCCGGGGCTACGCCTGTCTGAGCGTCGCTTTACTGCGCAT
GGCGGCGTCCGGTGAGCTCTCGCTGGCCTTCTGCGCAT
GGCTACGCCTGTCTGAGCGTCGCTTGTCTGCGCAT
GTCCTGGGAAACGGGGCGCGGCCGGCCCTGCGCAT
TGGTGACCACGGGTGACGGGGAAGCTGCGCAT
GACCCGCCGGGCAGCTTCCGGGAAACCAAAATCTGCGCATA
GGTTCGATTCCGGTTGCGTCCACCCACTGCGCAT
these reads will have problems to map
problem 2:
output read ID is concatenated to read name, causing problem with cutadapt:
# cutadapt error
ERROR: Error in sequence file at unknown line: Reads are improperly paired. Read name 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' in file 1 does not match 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:' in file 2.
the problem is that it is checking if A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT==A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT but because the space between is gone, it now determines 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' != 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:'
output from ultraplex:
(base) [hsher@tscc-login1 fastqs]$ zcat ultraplex_demux_PUM2_Fwd.fastq.gz | head
@A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG2:N:0:ATTACTCG+AGGCTATArbc:
AGTTGGGGAAATCGCAGGGGTCAGCACATCCGGAGTGCAATG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF
notice the space between @A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG and 2:N:0:ATTACTCG+AGGCTATArbc: is gone
input to ultraplex:
(base) [hsher@tscc-login1 fastqs]$ zcat all.Tr.umi.fq2.trim.gz | head
@A00475:502:HJLHHDRX2:1:2101:1090:1000:NCACCACCTG 2:N:0:ATTACTCG+AGGCTATA
CGTAGTAAACTCTCCCCGGGGCTCCCGCCGGCTTCTCCGGG
+
FFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF