Skip to content

Functions to select sequence subsamples based on different weighting methods #11

@kelly-sovacool

Description

@kelly-sovacool
  • Input: fasta, count, and/or dist file.
  • Params: weighting method (by abundance, by distance, or simple random sample); fraction or number of sequences to select.
  • Output: list of accession numbers selected (could be written to accnos file to then give to mothur get.seqs).

Already implemented in Python for the OptiFit benchmarking project. Want to re-write it in R and write tests. The mothur sub.sample command doesn't have a weighting parameter.

Metadata

Metadata

Labels

featurea feature request or enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions