Skip to content

ultraplex does not remove barcode from the other read (when insert is short); ultraplex destroy read ID #39

@algaebrown

Description

@algaebrown

Hi! Thanks for making this amazing tool

I noticed

  1. ultraplex does not remove barcode from the other read (when insert is short)
  2. ultraplex removed the space between read id and read name, so when I want to further remove the barcode at the other read, cutadapt cannot validate the integrity between two reads (read id is concatenated to name, causing problems)

In particular:

problem 1:
for a library with barcode ATGCGCAG, the output of the reverse read still contain the reverse complement of the barcode at the end (CTGCGCAT is reverse complement to ATGCGCAG)

zcat ultraple_demux_somthing_Rev.fastq.gz | grep -v '@' | grep CTGCGCAT

GACGTCGTGCTCTCCCCCTGCGCAT
CCCCCCGCGGGGGCGCGCCGGTTCTGCGCAT
CTCCCGGGGCTACGCCTGTCTGAGCGTCGCTATCTGCGCAT
GAAAGTCGGAGACCTGCGCAT
TCCCGGGGCTACGCCTGTCTGAGCGTCGCTTTACTGCGCAT
GGCGGCGTCCGGTGAGCTCTCGCTGGCCTTCTGCGCAT
GGCTACGCCTGTCTGAGCGTCGCTTGTCTGCGCAT
GTCCTGGGAAACGGGGCGCGGCCGGCCCTGCGCAT
TGGTGACCACGGGTGACGGGGAAGCTGCGCAT
GACCCGCCGGGCAGCTTCCGGGAAACCAAAATCTGCGCATA
GGTTCGATTCCGGTTGCGTCCACCCACTGCGCAT

these reads will have problems to map

problem 2:
output read ID is concatenated to read name, causing problem with cutadapt:

# cutadapt error
ERROR: Error in sequence file at unknown line: Reads are improperly paired. Read name 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' in file 1 does not match 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:' in file 2.

the problem is that it is checking if A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT==A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT but because the space between is gone, it now determines 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' != 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:'

output from ultraplex:

(base) [hsher@tscc-login1 fastqs]$ zcat ultraplex_demux_PUM2_Fwd.fastq.gz | head
@A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG2:N:0:ATTACTCG+AGGCTATArbc:
AGTTGGGGAAATCGCAGGGGTCAGCACATCCGGAGTGCAATG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF

notice the space between @A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG and 2:N:0:ATTACTCG+AGGCTATArbc: is gone

input to ultraplex:

(base) [hsher@tscc-login1 fastqs]$ zcat all.Tr.umi.fq2.trim.gz | head
@A00475:502:HJLHHDRX2:1:2101:1090:1000:NCACCACCTG 2:N:0:ATTACTCG+AGGCTATA
CGTAGTAAACTCTCCCCGGGGCTCCCGCCGGCTTCTCCGGG
+
FFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions