Functions to select sequence subsamples based on different weighting methods

- Input: fasta, count, and/or dist file.
- Params: weighting method (by abundance, by distance, or simple random sample); fraction or number of sequences to select.
- Output: list of accession numbers selected (could be written to accnos file to then give to mothur `get.seqs`).

Already [implemented in Python for the OptiFit benchmarking project](https://github.com/SchlossLab/OptiFitAnalysis/blob/master/code/py/split_weighted_subsample.py). Want to re-write it in R and write tests. The [mothur `sub.sample` command](https://mothur.org/wiki/sub.sample/) doesn't have a weighting parameter.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Functions to select sequence subsamples based on different weighting methods #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Functions to select sequence subsamples based on different weighting methods #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions