Skip to content
This repository was archived by the owner on Nov 19, 2025. It is now read-only.

Conversation

@arendu
Copy link
Collaborator

@arendu arendu commented Nov 14, 2024

What does this PR do ?

This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.

Currently, each datapoint inside a DPO jsonl data file, looks like this:

{
  "prompt": "<extra_id_0>System\n\n<extra_id_1>User\nbacillus subtilus\n<extra_id_1>Assistant\n",
  "chosen_response": "Bacillus ... and industry alike.\n<extra_id_1>",
  "rejected_response": "The Bacillus ... fields of study.\n<extra_id_1>",
  "rejected_reward": 3,
  "chosen_reward": 4
}

With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):

{
  "prompt": [
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "bacillus subtilus"
    }
  ],
  "chosen_response": {
    "role": "assistant",
    "content": "Bacillus ... and industry alike."
  },
  "rejected_response": {
    "role": "assistant",
    "content": "The Bacillus ... fields of study."
  },
  "chosen_reward": 4,
  "rejected_reward": 3
}

Additionally There is a script added to convert old data files into the new format.

python nemo_aligner/data/nlp/scripts/undo_special_tokens.py <path_to_old_format_dpo_jsonl_file>

A new file will be written in the same location as the old format file.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

@arendu arendu requested review from gshennvm and terrykong November 15, 2024 06:33
@arendu arendu marked this pull request as ready for review November 15, 2024 06:36
@arendu arendu changed the title Adithyare/dpo data refac DPO data format refactor Nov 15, 2024
@arendu arendu requested a review from terrykong November 18, 2024 22:38
@terrykong terrykong changed the title DPO data format refactor feat: support new DPO data format Nov 21, 2024
Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs

  • compatbility test
  • Stretch (update the dpo.sh template test script to convert the train data json into this new format)

@arendu arendu requested a review from terrykong November 21, 2024 05:29
@arendu arendu added the CI label Nov 21, 2024
@github-actions github-actions bot removed the CI label Nov 21, 2024
@terrykong terrykong force-pushed the adithyare/dpo_data_refac branch from d32515c to a112c19 Compare November 22, 2024 01:49
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong force-pushed the adithyare/dpo_data_refac branch from a112c19 to 0d3b8ee Compare November 22, 2024 01:50
@terrykong
Copy link
Collaborator

closing in favor of #403

@terrykong terrykong closed this Nov 22, 2024
@terrykong terrykong mentioned this pull request Nov 22, 2024
8 tasks
@terrykong terrykong reopened this Nov 22, 2024
@terrykong terrykong changed the title feat: support new DPO data format feat: support new DPO data format and update SFT config to use override API Dec 3, 2024
arendu and others added 2 commits December 3, 2024 23:20
Signed-off-by: arendu <adithya.r@gmail.com>
for more information, see https://pre-commit.ci

Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>
terrykong
terrykong previously approved these changes Dec 3, 2024
@terrykong terrykong added the Run CICD Set + un-set to retrigger (add after r*.*.* labels) label Dec 3, 2024
@arendu arendu added Run CICD Set + un-set to retrigger (add after r*.*.* labels) and removed Run CICD Set + un-set to retrigger (add after r*.*.* labels) labels Dec 4, 2024
@arendu arendu added Run CICD Set + un-set to retrigger (add after r*.*.* labels) and removed Run CICD Set + un-set to retrigger (add after r*.*.* labels) labels Dec 4, 2024
@terrykong terrykong enabled auto-merge (squash) December 4, 2024 01:58
@terrykong terrykong merged commit 5d4b2a7 into main Dec 4, 2024
18 checks passed
@terrykong terrykong deleted the adithyare/dpo_data_refac branch December 4, 2024 02:20
terrykong added a commit that referenced this pull request Dec 5, 2024
…de API (#405)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>

return output_dict

def convert(self, messages):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it can support apply_chat_template (https://huggingface.co/docs/transformers/main/en/chat_templating) for huggingface tokenizers that are adapted in most open-sourced LLMs?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Run CICD Set + un-set to retrigger (add after r*.*.* labels)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants