Fdpo #892

xinzhuo20 · 2025-12-16T00:05:28Z

#fDPO Implementation

Fine-tuning examples using Google Tunix with fDPO - a novel fine-grained preference learning algorithm for segment-level preference optimization.

fDPO Overview

fDPO (Fine-grained Direct Preference Optimization) extends traditional DPO by introducing segment-level preference granularity. Instead of applying a single global trade-off parameter β uniformly across all reasoning steps, fDPO separates responses into distinct components (description and reasoning) and applies adaptive, segment-specific β values.

Key Features

Segment-Level Optimization: Separates responses into description (R_desc) and reasoning (R_reason) components
Adaptive β Values: Dynamically computes β_desc and β_reason based on preference differentials
Balanced Learning: Prevents overfitting to simpler descriptive responses while properly optimizing complex reasoning paths

Reference
https://plan-lab.github.io/projects/spatialreasoner/

google-cla · 2025-12-16T00:05:32Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

abheesht17

@xinzhuo20

Are the results better than the baseline? Or do we still need to debug?
Can we move the trainer inside experimental?

xinzhuo20 · 2025-12-16T03:29:30Z

@xinzhuo20

Are the results better than the baseline? Or do we still need to debug?

Can we move the trainer inside experimental?

Still need to debug @abheesht17, can you take a look? Thank you.
Do you mean moving the trainer file to tunix/rl/experimental?

xinzhuo20 requested review from abheesht17, hgao327, jiangyangmu, lc5211, sizhit2, tianshub and wang2yn84 as code owners December 16, 2025 00:05

xinzhuo20 added 3 commits December 15, 2025 18:55

add fdpo implementation

1f5a4a6

update fdpo

90ec277

Update paper names

450237c

xinzhuo20 force-pushed the fdpo branch from 9a4a2d7 to 450237c Compare December 16, 2025 00:56

abheesht17 reviewed Dec 16, 2025

View reviewed changes

update hyperparameters

e811060

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fdpo #892

Fdpo #892

Uh oh!

xinzhuo20 commented Dec 16, 2025

Uh oh!

google-cla bot commented Dec 16, 2025

Uh oh!

abheesht17 left a comment

Uh oh!

xinzhuo20 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fdpo #892

Are you sure you want to change the base?

Fdpo #892

Uh oh!

Conversation

xinzhuo20 commented Dec 16, 2025

fDPO Overview

Key Features

Uh oh!

google-cla bot commented Dec 16, 2025

Uh oh!

abheesht17 left a comment

Choose a reason for hiding this comment

Uh oh!

xinzhuo20 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants