This project uses deep learning to generate classical piano music in the style of Chopin through a novel text-based approach. This project converts MIDI files into text representations that can be easily processed by RNNs.
Best output found here: output/2024-pytorch-rnn/temp-0.8/tmp.mid
The project originally used Andrej Karpathy's char-rnn implementation for generating music. This version produced interesting results (preserved in output/2017-karpathy-LSTM/) but relied on external dependencies and older deep learning frameworks.
After 7 years, the project was modernized with a custom PyTorch-based implementation. The new version features:
- Simplified architecture using PyTorch's built-in LSTM modules
- More efficient training pipeline
- Better code organization and documentation
- Improved generation quality through modern deep learning practices
The current implementation uses a character-level RNN built with PyTorch, consisting of:
- Embedding layer for character encoding
- Multi-layer LSTM network
- Dropout regularization for preventing overfitting
- Linear output layer for character prediction
- Mini-batch training with CUDA GPU support
- MIDI Collection: Source MIDI files are stored in
data/midi/ - Text Conversion: MIDI files are converted to ASCII format using conversion tools in
src/conversion/ - Data Preprocessing: ASCII files are stored in
data/ascii/and processed into training data - Training: The PyTorch model is trained on the text representation
- Generation: New music is sampled from the model at different temperature settings
- Training options:
- Hidden layer size: 256 (configurable)
- Number of LSTM layers: 2 (configurable)
- Dropout rate: 0.2
- Batch size: 128
- Sequence length: 100
- Temperature settings tested: 0.4, 0.8, 1.0
- Higher temperatures produce more creative but potentially error-prone compositions
- Lower temperatures generate more conservative, structured pieces
The project maintains both historical and current outputs:
output/2018-karpathy-LSTM/: Original char-rnn generated piecesoutput/2024-pytorch-rnn/: New PyTorch model generationstemp-0.4/: Conservative generationtemp-0.8/: Balanced creativity/structuretemp-1.0/: Most experimental generation
The MIDI files can be played using standard music software like GarageBand or this web player.
- Extended Training: Increase training duration and dataset size
- Architecture Tuning:
- Experiment with larger hidden layer sizes
- Test different dropout rates
- Try additional LSTM layers
- Data Augmentation: Add more Chopin compositions to training set
- Evaluation Metrics: Implement quantitative measures of musical quality
The core implementation is in src/model/, featuring:
train.py: Model definition and training logicgenerate.py: Text generation utilities- Efficient batch processing
- CUDA optimization for GPU acceleration
- Configurable sequence length for capturing musical patterns
- Checkpoint saving/loading for iterative training
For detailed usage instructions and parameters, see the model documentation in src/model/README.md.