Skip to content

Roadmap for asynchronous BEDbase/BEDboss data processing #84

@nsheff

Description

@nsheff

We need these components for asynchronous processing of BED files to populate bedbase.

  1. Automatic registering of new BED files posted to GEO.
  • we are already automatically creating a PEP with new BED files using github action
  • bbuploader process reads from PEPhub (run by bedboss geo upload-all)
  • implement --light CLI arg for bedboss geo upload --light ... (compute ID, upload BED file, collect and input metadata)
  • add new github action that will run this process weekly.
  • make sure current endpoints properly ignore "stub" records, if needed (those that haven't yet gone through full/heavy process).
  1. Allowing users to register new BEDsets
  • endpoint for registering a BEDset
    • POST a PEPhub registry path. BEDbase retrieves this PEP, validates against BEDbase::bedset schema, and then creates a new BEDset in the database. (This relies completely on pephub auth, so no further auth is required?) This should 1. validate against PEP schema. Do other validations. Limit size for now to, say 2k BEDs? maybe limit throughput? Maybe to start, hard code a list of "allowed" pephub namespaces.
    • Create a button on BEDbase cart page to "Create BEDset PEP for this Cart". (This would require user to authenticate with PEPhub...)
  1. Daemon that retrieves unprocessed BED files and processes them (plots and statistics)
    • endpoint for unprocessed files ?
    • endpoint for unprocessed plots
    • endpoint for unprocessed statistics
    • script thats hits these endpoints, and then does the correct processing (in BEDboss)
    • wrapper daemon to sleep and wrap the above script.
  2. Daemon that retrieves unprocessed BED sets and processes them.
    • endpoint for retrieving which BEDsets are registered, but not processed
    • script thats hits bedset endpoints, and then does the correct processing (in BEDboss)
    • wrapper daemon to sleep and wrap the above script.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions