Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
8526827
Bump openai to >=1.99.1,<2.0 for vllm compat
Oct 11, 2025
99e6644
try impl
bxyu-nvidia Oct 11, 2025
851ce0b
add to configs
bxyu-nvidia Oct 11, 2025
124b07c
fix tests
bxyu-nvidia Oct 11, 2025
d873f71
fix function bug
HeyyyyyyG Oct 12, 2025
e2b7c8a
start test
bxyu-nvidia Oct 13, 2025
30efa96
default to reasoning parser true
bxyu-nvidia Oct 13, 2025
509e5b0
fix test
bxyu-nvidia Oct 13, 2025
0eccc58
fix tests and logic
bxyu-nvidia Oct 13, 2025
037dea5
only add reasoning content if match
bxyu-nvidia Oct 14, 2025
314600e
add test
bxyu-nvidia Oct 14, 2025
5b0e2c2
feat(mcqa): Add template_metadata support with custom re
psgundecha-nv Oct 3, 2025
2501bd1
Update mcqa app
psgundecha-nv Oct 7, 2025
17982bd
Fix MCQA extraction validation for template_metadata regex
psgundecha-nv Oct 8, 2025
3fb1ac7
test(mcqa): Add tests and documentation for template_metadata
psgundecha-nv Oct 10, 2025
ab82c29
Add *.backup to .gitignore
psgundecha-nv Oct 14, 2025
08e384a
Merge NVIDIA-NeMo/Gym PR #129 into tkonuk/compat-openai-199
ertkonuk Oct 14, 2025
ea2f33e
Merge NVIDIA-NeMo/Gym PR #128 into tkonuk/compat-openai-199
ertkonuk Oct 14, 2025
2b27a1a
fix none + str
bxyu-nvidia Oct 15, 2025
a7c3880
max steps default to 1
HeyyyyyyG Oct 15, 2025
b2e3e74
skip_special_tokens to false
HeyyyyyyG Oct 15, 2025
f72b41c
Initial resource server and tests for BLEU score
hrossnv Oct 15, 2025
fa600ea
Add example and fix BLEU warnings
hrossnv Oct 15, 2025
23c9a83
Add dataset license and clean up app.py
hrossnv Oct 16, 2025
0a99cef
Merge remote-tracking branch 'origin/main' into hross/mt-verifiers
hrossnv Oct 16, 2025
4bd62c3
Fix translation field names to match dataset
hrossnv Oct 17, 2025
ce30909
Temporary fix to override mecab-ko dependency of sacrebleu to be comp…
hrossnv Oct 17, 2025
f1f27e5
Fix workbench
ertkonuk Oct 17, 2025
42b5e2f
Add BLEU translation validation set and fix config
hrossnv Oct 20, 2025
590e37e
Fix BLEU tests after renaming fields
hrossnv Oct 22, 2025
9bd1181
Fix BLEU license
hrossnv Oct 23, 2025
ee820ae
Add initial COMET translation verifier with tests
hrossnv Oct 23, 2025
6106bfa
Add tests for reference-free COMET verifier
hrossnv Oct 24, 2025
11c02d8
Add reference-free COMET translation config, fix BLEU and COMET examp…
hrossnv Oct 24, 2025
b157fe9
Update BLEU and COMET examples
hrossnv Oct 27, 2025
5a40bca
Ray comp coding infra (#195)
sdevare-nv Oct 22, 2025
a03b385
Fix ray version mismatch (#231)
sdevare-nv Oct 25, 2025
53f2aef
fix version mismatch
HeyyyyyyG Oct 29, 2025
70e2bd8
Fix COMET to default to CPU
hrossnv Oct 30, 2025
31038d3
Merge remote-tracking branch 'origin/tkonuk/compat-openai-199' into h…
hrossnv Oct 30, 2025
3992f78
Initial support for MetricX translation verifier
hrossnv Oct 31, 2025
4611813
Update resource server names to match existing ones; add dataset metr…
hrossnv Oct 31, 2025
5f2f4ef
Initial setup of translation LLM-as-judge resource server
hrossnv Nov 5, 2025
4b60056
Add translation LLM-as-judge datasets
hrossnv Nov 6, 2025
324b940
Update translation verifiers to configure thinking split
hrossnv Nov 6, 2025
7eae181
Merge remote-tracking branch 'origin/hross/mt-verifiers' into pjin/hr…
pjin-nvidia Dec 10, 2025
950994e
Fixup pyproject.toml, requirements.txt, uv.lock.
pjin-nvidia Dec 10, 2025
34f59a4
Revert max_steps = 1.
pjin-nvidia Dec 10, 2025
5f7f593
Revert simple_agent to main.
pjin-nvidia Dec 10, 2025
cfebf1d
Finish reverting.
pjin-nvidia Dec 10, 2025
eb19457
Newline.
pjin-nvidia Dec 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ fastspeech_output
.bash_history.local

# Byte-compiled / optimized / DLL files
__pycache__/
**/__pycache__/
*.py[cod]
*$py.class
**.pyc
Expand Down
1 change: 1 addition & 0 deletions nemo_gym/config_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,7 @@ class DatasetConfig(BaseModel):
Literal["MIT"],
Literal["Creative Commons Attribution 4.0 International"],
Literal["Creative Commons Attribution-ShareAlike 4.0 International"],
Literal["NVIDIA Internal Use Only, Do Not Distribute"],
Literal["TBD"],
Literal["MIT"],
]
Expand Down
11 changes: 11 additions & 0 deletions resources_servers/translation_bleu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Description

Data links: ?

# Licensing information
Code: Apache 2.0
Data: NVIDIA Internal Use Only, Do Not Distribute

Dependencies
- nemo_gym: Apache 2.0
- sacrebleu: Apache 2.0
110 changes: 110 additions & 0 deletions resources_servers/translation_bleu/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict

from fastapi import FastAPI
from sacrebleu.metrics import BLEU

from nemo_gym.base_resources_server import (
BaseResourcesServerConfig,
BaseVerifyRequest,
BaseVerifyResponse,
SimpleResourcesServer,
)


class TranslationBleuResourcesServerConfig(BaseResourcesServerConfig):
reasoning_split_word: str = "</think>"


class TranslationBleuVerifyRequest(BaseVerifyRequest):
trg_txt: str
trg_lang: str


class TranslationBleuVerifyResponse(BaseVerifyResponse):
trg_txt: str
trg_lang: str
extracted_answer: str


class TranslationBleuResourcesServer(SimpleResourcesServer):
config: TranslationBleuResourcesServerConfig

TOKENIZER_MAP: Dict[str, str] = {
"zh": "zh",
"zh-cn": "zh",
"zh-tw": "zh",
"zho-CN": "zh",
"zho_simpl": "zh",
"ja": "ja-mecab",
"jpn": "ja-mecab",
"th": "flores200",
"ko": "ko-mecab",
}

def setup_webserver(self) -> FastAPI:
app = super().setup_webserver()

# Additional server routes go here! e.g.:
# app.post("/get_weather")(self.get_weather)

return app

async def verify(self, body: TranslationBleuVerifyRequest) -> TranslationBleuVerifyResponse:
assistant_responses = []
for output_item in body.response.output:
if output_item.type != "message":
continue

for content_item in output_item.content:
if content_item.type != "output_text":
continue

assistant_responses.append(content_item.text)

combined_response = "".join(assistant_responses)

(reward, extracted_answer) = self._verify_answer(
ground_truth=body.trg_txt, target_lang=body.trg_lang, model_response=combined_response
)

return TranslationBleuVerifyResponse(**body.model_dump(), extracted_answer=extracted_answer, reward=reward)

def _verify_answer(self, ground_truth: str, target_lang: str, model_response: str) -> tuple[float, str]:
extracted_answer = self._extract_answer(model_response)

if target_lang in self.TOKENIZER_MAP:
tokenize = self.TOKENIZER_MAP[target_lang]
else:
tokenize = None
# Use effective_order for sentence-level BLEU
bleu = BLEU(trg_lang=target_lang, effective_order=True, tokenize=tokenize)

bleu_output = bleu.sentence_score(extracted_answer, [ground_truth])
# TODO Do we want to report any other BLEU outputs?
bleu_score = bleu_output.score
reward = bleu_score / 100.0

return reward, extracted_answer

def _extract_answer(self, model_response: str) -> str:
# Strip any thinking
no_think_response = model_response.split(self.config.reasoning_split_word)[-1]
no_think_response = no_think_response.strip()
return no_think_response


if __name__ == "__main__":
TranslationBleuResourcesServer.run_webserver()
38 changes: 38 additions & 0 deletions resources_servers/translation_bleu/configs/translation_bleu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
translation_bleu:
resources_servers:
translation_bleu:
entrypoint: app.py
domain: translation
translation_bleu_simple_agent:
responses_api_agents:
simple_agent:
entrypoint: app.py
resources_server:
type: resources_servers
name: translation_bleu
model_server:
type: responses_api_models
name: policy_model
datasets:
- name: train
type: train
jsonl_fpath: resources_servers/translation_bleu/data/riva_mt_v3_nothinkInSys_train.jsonl
num_repeats: 1
gitlab_identifier:
dataset_name: riva_mt_v3_nothinkInSys_train
version: 0.0.3
artifact_fpath: riva_mt_v3_nothinkInSys_train.jsonl
license: NVIDIA Internal Use Only, Do Not Distribute
- name: validation
type: validation
jsonl_fpath: resources_servers/translation_bleu/data/riva_mt_v3_nothinkInSys_validation.jsonl
num_repeats: 1
gitlab_identifier:
dataset_name: riva_mt_v3_nothinkInSys_validation
version: 0.0.1
artifact_fpath: riva_mt_v3_nothinkInSys_validation.jsonl
license: NVIDIA Internal Use Only, Do Not Distribute
- name: example
type: example
jsonl_fpath: resources_servers/translation_bleu/data/example.jsonl
num_repeats: 1
5 changes: 5 additions & 0 deletions resources_servers/translation_bleu/data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*train.jsonl
*validation.jsonl
*train_prepare.jsonl
*validation_prepare.jsonl
*example_prepare.jsonl
5 changes: 5 additions & 0 deletions resources_servers/translation_bleu/data/example.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{"responses_create_params": {"input": [{"role": "user", "content": "Convert the following text into English. Text:\n```\n¿Por qué malgastar el tiempo buscando las llaves?\n```"}]}, "src_txt": "¿Por qué malgastar el tiempo buscando las llaves?", "trg_txt": "Why waste your time looking for the keys?", "src_lang": "es-us", "trg_lang": "en"}
{"responses_create_params": {"input": [{"role": "user", "content": "Translate this into English: Ihm wurden das Konservatorium und das Theater gewidmet, in dem jedes Jahr das „Rossini Opera Festival“ stattfindet, das Opernliebhaber aus aller Welt anlockt."}]}, "src_txt": "Ihm wurden das Konservatorium und das Theater gewidmet, in dem jedes Jahr das „Rossini Opera Festival“ stattfindet, das Opernliebhaber aus aller Welt anlockt.", "trg_txt": "Every year, the conservatory and theatre, both named after him, host the Rossini Opera Festival, drawing enthusiasts from around the world.", "src_lang": "de", "trg_lang": "en"}
{"responses_create_params": {"input": [{"role": "user", "content": "Convert the following text into French. Text:\n```\nAll questions have been correctly answered by the treasurer Thomas Kräuchi.\n```"}]}, "src_txt": "All questions have been correctly answered by the treasurer Thomas Kräuchi.", "trg_txt": "Le trésorier Thomas Kräuchi a répondu correctement à toutes les questions.", "src_lang": "en", "trg_lang": "fr"}
{"responses_create_params": {"input": [{"role": "user", "content": "Convert the following text into Japanese. Text:\n```\nThe next picture shows the atoms emitting photons. Of course, in reality photons are a lot smaller than those in the picture.\n```"}]}, "src_txt": "The next picture shows the atoms emitting photons. Of course, in reality photons are a lot smaller than those in the picture.", "trg_txt": "次の写真は、原子が光子を放出している様子です。もちろん、実際には写真よりもはるかに微小です。", "src_lang": "en", "trg_lang": "ja"}
{"responses_create_params": {"input": [{"role": "user", "content": "Translate the following text from Simplified Chinese to English: 比赛是在草地上进行的,洞周围的草被修剪得更短,被称为果岭。"}]}, "src_txt": "比赛是在草地上进行的,洞周围的草被修剪得更短,被称为果岭。", "trg_txt": "The game is played on grass, and the grass around the hole is mown shorter and called the green.", "src_lang": "zh-cn", "trg_lang": "en"}
5 changes: 5 additions & 0 deletions resources_servers/translation_bleu/data/example_nothink.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{"responses_create_params": {"input": [{"role": "system", "content":"/no_think"}, {"role": "user", "content": "Convert the following text into English. Text:\n```\n¿Por qué malgastar el tiempo buscando las llaves?\n```"}]}, "src_txt": "¿Por qué malgastar el tiempo buscando las llaves?", "trg_txt": "Why waste your time looking for the keys?", "src_lang": "es-us", "trg_lang": "en"}
{"responses_create_params": {"input": [{"role": "system", "content":"/no_think"}, {"role": "user", "content": "Translate this into English: Ihm wurden das Konservatorium und das Theater gewidmet, in dem jedes Jahr das „Rossini Opera Festival“ stattfindet, das Opernliebhaber aus aller Welt anlockt."}]}, "src_txt": "Ihm wurden das Konservatorium und das Theater gewidmet, in dem jedes Jahr das „Rossini Opera Festival“ stattfindet, das Opernliebhaber aus aller Welt anlockt.", "trg_txt": "Every year, the conservatory and theatre, both named after him, host the Rossini Opera Festival, drawing enthusiasts from around the world.", "src_lang": "de", "trg_lang": "en"}
{"responses_create_params": {"input": [{"role": "system", "content":"/no_think"}, {"role": "user", "content": "Convert the following text into French. Text:\n```\nAll questions have been correctly answered by the treasurer Thomas Kräuchi.\n```"}]}, "src_txt": "All questions have been correctly answered by the treasurer Thomas Kräuchi.", "trg_txt": "Le trésorier Thomas Kräuchi a répondu correctement à toutes les questions.", "src_lang": "en", "trg_lang": "fr"}
{"responses_create_params": {"input": [{"role": "system", "content": "/no_think"}, {"role": "user", "content": "Convert the following text into Japanese. Text:\n```\nThe next picture shows the atoms emitting photons. Of course, in reality photons are a lot smaller than those in the picture.\n```"}]}, "src_txt": "The next picture shows the atoms emitting photons. Of course, in reality photons are a lot smaller than those in the picture.", "trg_txt": "次の写真は、原子が光子を放出している様子です。もちろん、実際には写真よりもはるかに微小です。", "src_lang": "en", "trg_lang": "ja"}
{"responses_create_params": {"input": [{"role": "system", "content": "/no_think"}, {"role": "user", "content": "Translate the following text from Simplified Chinese to English: 比赛是在草地上进行的,洞周围的草被修剪得更短,被称为果岭。"}]}, "src_txt": "比赛是在草地上进行的,洞周围的草被修剪得更短,被称为果岭。", "trg_txt": "The game is played on grass, and the grass around the hole is mown shorter and called the green.", "src_lang": "zh-cn", "trg_lang": "en"}
Loading
Loading