-
Notifications
You must be signed in to change notification settings - Fork 0
Added modules to enable iRODS operations #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…on and iRODS metadata modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces comprehensive iRODS (Integrated Rule-Oriented Data System) integration modules for Nextflow pipelines, enabling seamless data storage and metadata management. The implementation includes four core iRODS modules, a CSV utility module, and the necessary testing infrastructure.
Key Changes
- Added four iRODS operation modules:
storefile,getmetadata,attachmetadata, andaggregatemetadatafor complete data lifecycle management - Implemented
csv/concatutility module for merging multiple CSV files with configurable join options - Established nf-test framework configuration and comprehensive test suites with snapshot testing
Reviewed Changes
Copilot reviewed 24 out of 27 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/sanger-cellgeni/irods/storefile/* | Uploads files to iRODS with MD5 checksum verification and parallel transfer support |
| modules/sanger-cellgeni/irods/getmetadata/* | Retrieves and formats iRODS metadata as CSV from collections and data objects |
| modules/sanger-cellgeni/irods/attachmetadata/* | Attaches metadata to iRODS paths with duplicate detection and delimiter support |
| modules/sanger-cellgeni/irods/aggregatemetadata/* | Aggregates metadata from multiple CSV files into consolidated CSV/JSON outputs |
| modules/sanger-cellgeni/csv/concat/* | Concatenates CSV files with configurable axis and join strategies |
| nf-test.config, tests/config/nf-test.config | Configures nf-test framework with plugins and execution settings |
| modules/meta-schema.json | JSON schema for module metadata validation (copied from nf-core/modules) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "${file}" "${irodspath}" | ||
|
|
||
| # Calculate iRODS md5 | ||
| sleep 1 # wait for iRODS to do it's thing |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'it's' to 'its' (possessive form, not contraction).
| sleep 1 # wait for iRODS to do it's thing | |
| sleep 1 # wait for iRODS to do its thing |
|
|
||
| touch ${prefix}.bam |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stub section creates a .bam file but the process doesn't produce any file outputs according to the output definition. This appears to be copy-paste residue from a template.
| touch ${prefix}.bam |
| "--axis 'index'", | ||
| "--join 'outer'", |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The quotes around 'index' are unnecessary and inconsistent. Since this is being passed as a command-line argument, simple quoting (without the inner quotes) would be cleaner: --axis index
| "--axis 'index'", | |
| "--join 'outer'", | |
| "--axis index", | |
| "--join outer", |
| | (grep -E 'attribute|value|units' || true) \ | ||
| | sed -e 's/^attribute: //' -e 's/^value: //' -e 's/^units: //' \ | ||
| | sed -e "s/\\\"/'/g" \ | ||
| | awk 'NR%3!=0 {printf "\\\"%s\\\",", \$0} NR%3==0 {printf "\\\"%s\\\"\\n", \$0}' > irods_metadata.csv |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The complex awk command with excessive escaping is difficult to read and maintain. Consider using a here-document or breaking it into multiple steps for better clarity.
| | awk 'NR%3!=0 {printf "\\\"%s\\\",", \$0} NR%3==0 {printf "\\\"%s\\\"\\n", \$0}' > irods_metadata.csv | |
| | awk ' | |
| NR%3!=0 {printf "\"%s\",", $0} | |
| NR%3==0 {printf "\"%s\"\n", $0} | |
| ' > irods_metadata.csv |
| singularity { | ||
| singularity.enabled = true | ||
| singularity.autoMounts = true | ||
| singularity.runOptions = '-B /lustre,/nfs' |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded mount paths /lustre,/nfs are environment-specific and may not exist in all execution environments. Consider making these configurable via parameters or environment variables.
| singularity.enabled = true | ||
| singularity.autoMounts = true | ||
| singularity.runOptions = '-B /lustre,/nfs' | ||
| singularity.cacheDir = '/nfs/cellgeni/singularity/images/' |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded cache directory path is environment-specific and may not exist in all execution environments. Consider making this configurable via parameters or environment variables.
| singularity.cacheDir = '/nfs/cellgeni/singularity/images/' | |
| singularity.cacheDir = params.singularity_cache_dir ?: System.getenv('SINGULARITY_CACHEDIR') ?: '/nfs/cellgeni/singularity/images/' |
Removed unnecessary echo and touch commands, and added metadata output to a file.
Overview
This PR introduces comprehensive iRODS (Integrated Rule-Oriented Data System) integration for Nextflow pipelines, enabling seamless data management workflows.
New Modules Added
iRODS Integration Suite:
aggregatemetadata- Aggregates iRODS metadata from CSV files with duplicate handling, outputs CSV/JSONattachmetadata- Attaches metadata to iRODS collections/objects with duplicate detectiongetmetadata- Retrieves metadata from iRODS paths, formats as CSVstorefile- Uploads files to iRODS with MD5 verification and parallel supportUtility Module:
csv/concat- Concatenates multiple CSV files with flexible join optionsAdditional files
modules/meta-schema.json- json validation schema that I copied from nf-core/modules repository to enable linting withnf-core tools