Skip to content

Conversation

@Claptar
Copy link

@Claptar Claptar commented Oct 27, 2025

Overview

This PR introduces comprehensive iRODS (Integrated Rule-Oriented Data System) integration for Nextflow pipelines, enabling seamless data management workflows.

New Modules Added

iRODS Integration Suite:

  • aggregatemetadata - Aggregates iRODS metadata from CSV files with duplicate handling, outputs CSV/JSON
  • attachmetadata - Attaches metadata to iRODS collections/objects with duplicate detection
  • getmetadata - Retrieves metadata from iRODS paths, formats as CSV
  • storefile - Uploads files to iRODS with MD5 verification and parallel support

Utility Module:

  • csv/concat - Concatenates multiple CSV files with flexible join options

Additional files

  • modules/meta-schema.json - json validation schema that I copied from nf-core/modules repository to enable linting with nf-core tools
  • nf-test config files to set up testing

@Claptar Claptar self-assigned this Oct 27, 2025
@Claptar Claptar added the enhancement New feature or request label Oct 27, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive iRODS (Integrated Rule-Oriented Data System) integration modules for Nextflow pipelines, enabling seamless data storage and metadata management. The implementation includes four core iRODS modules, a CSV utility module, and the necessary testing infrastructure.

Key Changes

  • Added four iRODS operation modules: storefile, getmetadata, attachmetadata, and aggregatemetadata for complete data lifecycle management
  • Implemented csv/concat utility module for merging multiple CSV files with configurable join options
  • Established nf-test framework configuration and comprehensive test suites with snapshot testing

Reviewed Changes

Copilot reviewed 24 out of 27 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
modules/sanger-cellgeni/irods/storefile/* Uploads files to iRODS with MD5 checksum verification and parallel transfer support
modules/sanger-cellgeni/irods/getmetadata/* Retrieves and formats iRODS metadata as CSV from collections and data objects
modules/sanger-cellgeni/irods/attachmetadata/* Attaches metadata to iRODS paths with duplicate detection and delimiter support
modules/sanger-cellgeni/irods/aggregatemetadata/* Aggregates metadata from multiple CSV files into consolidated CSV/JSON outputs
modules/sanger-cellgeni/csv/concat/* Concatenates CSV files with configurable axis and join strategies
nf-test.config, tests/config/nf-test.config Configures nf-test framework with plugins and execution settings
modules/meta-schema.json JSON schema for module metadata validation (copied from nf-core/modules)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"${file}" "${irodspath}"

# Calculate iRODS md5
sleep 1 # wait for iRODS to do it's thing
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'it's' to 'its' (possessive form, not contraction).

Suggested change
sleep 1 # wait for iRODS to do it's thing
sleep 1 # wait for iRODS to do its thing

Copilot uses AI. Check for mistakes.
Comment on lines 105 to 106

touch ${prefix}.bam
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stub section creates a .bam file but the process doesn't produce any file outputs according to the output definition. This appears to be copy-paste residue from a template.

Suggested change
touch ${prefix}.bam

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +7
"--axis 'index'",
"--join 'outer'",
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The quotes around 'index' are unnecessary and inconsistent. Since this is being passed as a command-line argument, simple quoting (without the inner quotes) would be cleaner: --axis index

Suggested change
"--axis 'index'",
"--join 'outer'",
"--axis index",
"--join outer",

Copilot uses AI. Check for mistakes.
| (grep -E 'attribute|value|units' || true) \
| sed -e 's/^attribute: //' -e 's/^value: //' -e 's/^units: //' \
| sed -e "s/\\\"/'/g" \
| awk 'NR%3!=0 {printf "\\\"%s\\\",", \$0} NR%3==0 {printf "\\\"%s\\\"\\n", \$0}' > irods_metadata.csv
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The complex awk command with excessive escaping is difficult to read and maintain. Consider using a here-document or breaking it into multiple steps for better clarity.

Suggested change
| awk 'NR%3!=0 {printf "\\\"%s\\\",", \$0} NR%3==0 {printf "\\\"%s\\\"\\n", \$0}' > irods_metadata.csv
| awk '
NR%3!=0 {printf "\"%s\",", $0}
NR%3==0 {printf "\"%s\"\n", $0}
' > irods_metadata.csv

Copilot uses AI. Check for mistakes.
singularity {
singularity.enabled = true
singularity.autoMounts = true
singularity.runOptions = '-B /lustre,/nfs'
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded mount paths /lustre,/nfs are environment-specific and may not exist in all execution environments. Consider making these configurable via parameters or environment variables.

Copilot uses AI. Check for mistakes.
singularity.enabled = true
singularity.autoMounts = true
singularity.runOptions = '-B /lustre,/nfs'
singularity.cacheDir = '/nfs/cellgeni/singularity/images/'
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded cache directory path is environment-specific and may not exist in all execution environments. Consider making this configurable via parameters or environment variables.

Suggested change
singularity.cacheDir = '/nfs/cellgeni/singularity/images/'
singularity.cacheDir = params.singularity_cache_dir ?: System.getenv('SINGULARITY_CACHEDIR') ?: '/nfs/cellgeni/singularity/images/'

Copilot uses AI. Check for mistakes.
Removed unnecessary echo and touch commands, and added metadata output to a file.
@Claptar Claptar changed the title Added modules for to enable iRODS operations Added modules to enable iRODS operations Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

2 participants