Skip to content

Conversation

@hqkqn32
Copy link

@hqkqn32 hqkqn32 commented Dec 4, 2025

Description

Fixes #42502

This PR adds support for torch.Tensor labels in DataCollatorWithFlattening. Previously, the collator only worked with list labels and raised a TypeError when tensor labels were provided.

Problem

When users passed torch.Tensor objects as labels to DataCollatorWithFlattening, it failed with:

TypeError: can only concatenate list (not "Tensor") to list

This happened because the collator used list concatenation (+=) without checking if inputs were tensors.

Solution

  • Added type checking using hasattr(obj, "tolist")
  • Convert tensors to lists before concatenation
  • Applied to both input_ids and labels
  • Maintains backward compatibility with list inputs

Changes Made

  • Modified src/transformers/data/data_collator.py:
    • Added tensor-to-list conversion in DataCollatorWithFlattening.__call__()
  • Added tests in tests/trainer/test_data_collator.py:
    • test_flattening_with_tensor_labels: Verifies tensor labels work
    • test_flattening_with_list_labels: Regression test for list labels

Testing

Before this PR ❌

features = [
    {"input_ids": torch.tensor([1, 2, 3, 4]), "labels": torch.tensor([10, 11, 12, 13])},
    {"input_ids": torch.tensor([5, 6, 7]), "labels": torch.tensor([14, 15, 16])},
]
collator = DataCollatorWithFlattening(return_tensors="pt")
batch = collator(features)  # TypeError!

After this PR ✅

features = [
    {"input_ids": torch.tensor([1, 2, 3, 4]), "labels": torch.tensor([10, 11, 12, 13])},
    {"input_ids": torch.tensor([5, 6, 7]), "labels": torch.tensor([14, 15, 16])},
]
collator = DataCollatorWithFlattening(return_tensors="pt")
batch = collator(features)  # Works perfectly!

Checklist

  • Bug fix (non-breaking change which fixes an issue)
  • Tests added for the fix
  • All new and existing tests passed
  • Code follows the project's style guidelines

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, changes and tests look good to me!

- Add tensor to list conversion in DataCollatorWithFlattening
- Convert input_ids and labels to list if they are tensors
- Add tests for both tensor and list labels
- Fixes huggingface#42599
@Rocketknight1 Rocketknight1 force-pushed the fix/datacollator-tensor-labels branch from c539cb9 to 558ea2a Compare December 4, 2025 13:30
@Rocketknight1
Copy link
Member

(Give the CI a little bit and try again, I think it's just an intermittent issue)

@hqkqn32
Copy link
Author

hqkqn32 commented Dec 4, 2025

@Rocketknight1 Thank you for the clarification! I'll trigger the CI again with an empty commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataCollatorWithFlattening only accepts labels as list

2 participants