-
Notifications
You must be signed in to change notification settings - Fork 9
feat(backend, deposition): add a deposition status to the backend and use to filter get-released-data prior to streaming #5767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… use to restrict number of sequences that need to be streamed
f98fb5d to
a972104
Compare
Code ReviewThis PR implements a coarse-grained filter for the Overall AssessmentThe implementation is well-designed and achieves its performance goal effectively. The code quality is good with clear separation of concerns between database logic, API layer, and client code. Positive Aspects✅ Performance improvement: Dramatic reduction in execution time (15-20min → 1.5min) by filtering at the database level rather than in application code Issues & Concerns🔴 Critical: Missing Test CoverageThe PR adds a new query parameter and significant database logic but doesn't include any tests. This is concerning because:
Recommendation: Add tests to
🟡 Potential Logic Issue: Hardcoded Group IDWHEN (se.group_id = 1) THEN 'INGESTED'::textThe use of hardcoded
Concerns:
Recommendations:
🟡 SQL Migration Duplicates View LogicThe migration file Observation: If the view logic changes in the future, developers must remember to update both files. Consider adding a comment in 🟡 Parameter Naming InconsistencyThe parameter is called @RequestParam(name = "enaDeposition", required = false, defaultValue = "false") enaDeposition: BooleanBut in Python it's passed as a query parameter with the same name. Consider if a more descriptive name like 🟡 Missing API DocumentationThe new query parameter isn't documented in the controller: @RequestParam(
name = "enaDeposition",
required = false,
defaultValue = "false",
) enaDeposition: Boolean,Recommendation: Add a @Parameter(
description = "(Optional) Filter to return only sequences ready for ENA deposition (excludes already deposited and ingested sequences)",
)
@RequestParam(name = "enaDeposition", required = false, defaultValue = "false") enaDeposition: Boolean,Code Quality ObservationsMinor: Formatting Issue in SubmissionDatabaseService.ktThe conditional logic around line 721-726 has inconsistent formatting: (
if (enaDeposition) {
(SequenceEntriesView.enDepositionStatusIs(EnaDepositionStatus.READY))
} else {
Op.TRUE
}
)The extra parentheses around the entire conditional and around if (enaDeposition) {
SequenceEntriesView.enDepositionStatusIs(EnaDepositionStatus.READY)
} else {
Op.TRUE
}Good: Type SafetyNice use of the Security & Performance✅ SQL Injection: No concerns - using Exposed framework's type-safe query builders Recommendations SummaryMust address:
Should address: Nice to have: Questions
Overall, this is a solid performance optimization with clean implementation. The main concern is the lack of tests for the new behavior. With tests added, this would be ready to merge. |
resolves #
Summary
Add a new field to the sequences_entries_view called
ena_deposition_statuswhich is an enum of type:This column is used as a coarse-grained filter option when calling the
get-released-dataendpoint.By adding the param
filterForEnaDepositionthe endpoints stops streaming sequences that were ingested from the INSDC or have been deposited by us in the INSDC with external metadata updated. Edge cases, such as revised versions, restricted data etc. are still handled by the ena cronjob.The cronjob now takes 1min and 30seconds to run instead of the 15-20min it currently takes.
I also confirmed with spot checking that the new status was set correctly for these submissions.
PR Checklist
🚀 Preview: https://deposited-status-filter.loculus.org