Fix parallelism_config being overwritten in TP-only training #42671

arrdel · 2025-12-06T02:03:02Z

Problem

When using TP with user-provided parallelism_config, the Trainer overwrites the entire config object, discarding user settings.

Solution

Update only tp_size attribute when config is provided
Create new config only when none is provided
Preserve all user-provided parallelism settings

Changes

Fixed logic in trainer.py lines 5074-5087
Removed redundant nested checks
User configs (dp_size, pp_size, cp_backend, etc.) now preserved

Fixes huggingface#42661 When using TP with a user-provided parallelism_config, the Trainer was incorrectly overwriting the entire config object with a new ParallelismConfig(tp_size=model.tp_size), discarding all user-provided settings (dp_size, pp_size, cp_backend, etc.). Changes: - If user provides parallelism_config, update only the tp_size attribute - If no config is provided, create a new ParallelismConfig with tp_size - Removed redundant nested condition check - Fixed logic flow to check accelerate version first This ensures user-provided parallelism configurations are preserved during TP-only training.

Copilot

Pull request overview

This PR fixes a bug where the Trainer was discarding user-provided parallelism_config settings when Tensor Parallelism (TP) was enabled. Previously, the entire config object was being overwritten; now only the tp_size attribute is updated while preserving other user settings.

Key changes:

Fixed conditional logic to update tp_size in existing user configs instead of creating a new config
Restructured version checks to be more clear and avoid redundant nesting

Copilot · 2025-12-06T02:12:44Z

src/transformers/trainer.py

+                if self.args.parallelism_config is not None:
+                    # Update tp_size in user-provided config instead of overwriting it
+                    self.args.parallelism_config.tp_size = self.model.tp_size
                else:
-                    raise ValueError("Requires accelerate>1.10.1 to use Tensor Parallelism.")
+                    # Only create new config if user didn't provide one
+                    from accelerate import ParallelismConfig
+
+                    args["parallelism_config"] = ParallelismConfig(tp_size=self.model.tp_size)


This bug fix should have test coverage to ensure that user-provided parallelism_config objects are preserved when TP is enabled. Consider adding a test that:

Creates a ParallelismConfig with custom settings (e.g., dp_size, cp_size)

Passes it to TrainingArguments along with a model that has tp_size > 1

Verifies that after trainer initialization, the parallelism_config still has the original custom settings AND the updated tp_size

Copilot AI review requested due to automatic review settings December 6, 2025 02:03

Copilot started reviewing on behalf of arrdel December 6, 2025 02:03 View session

Copilot AI reviewed Dec 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix parallelism_config being overwritten in TP-only training #42671

Fix parallelism_config being overwritten in TP-only training #42671

arrdel commented Dec 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix parallelism_config being overwritten in TP-only training #42671

Are you sure you want to change the base?

Fix parallelism_config being overwritten in TP-only training #42671

Conversation

arrdel commented Dec 6, 2025

Problem

Solution

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant