- Input: fasta, count, and/or dist file.
- Params: weighting method (by abundance, by distance, or simple random sample); fraction or number of sequences to select.
- Output: list of accession numbers selected (could be written to accnos file to then give to mothur
get.seqs).
Already implemented in Python for the OptiFit benchmarking project. Want to re-write it in R and write tests. The mothur sub.sample command doesn't have a weighting parameter.