Time and Space Limits for Deep Learning on UoS HPCs

Hello! I'm looking to retrain / train a larger model inspired by https://github.com/nadavbra/protein_bert and was wondering:

1. Where should I store training data? It's >1TB but it seems most storage locations that large on the Sheffield HPCs are temporary storage only?
2. Could that data be persisted and could I reserve GPU usage for a few weeks to a month? The original model was trained for a month, but this was likely using a lower-powered GPU!

The GitHub for ProteinBERT seems to have some pretty nice instructions for the retraining, I'm just wondering if I'll run into any issues using that much disk space and that much GPU time!

Thanks a ton,
Brooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Time and Space Limits for Deep Learning on UoS HPCs #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time and Space Limits for Deep Learning on UoS HPCs #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions