Skip to content

Feature: similarity based clustering for pair-end sequences which cannot be assembled  #175

@lentendu

Description

@lentendu

Due to bad 3'-end quality, some datasets cannot be pair-end assembled.
So add an option to force analyses even without pair-end assembly.
Possible for dissimilarity based clustering:

  1. optimize length trimming based on maxEE/quality
  2. if too low number of pair-end assembled seq and force set on, skip pair-end at trim step
  3. unique et all pair's dissimilarities per fragment
  4. resolve the unique set of assembled dereplicated sequences from both strand (and add a xx length gap between both fragments)
  5. averaged dissimilarity for all pair of assembled dereplicated sequences weighted by length
  6. clustering (sumaclust or MCL) on average dissimilarity
  7. vsearch usearch-global with no gap extension penalty for the query sequence (--gapext 2IT/0IQ/1E) --> to test

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions